Ananthi Kaliyamoorthy * S. Manoharan **

* Assistant Professor, Department of Electronics and Instrumentation Engineering, Karpagam College of Engineering, Coimbatore.

**Professor, Department of Electronics and Instrumentation Engineering, Karpagam College of Engineering, Coimbatore.

Abstract

Today there is a growing need for reducing the power consumption of microprocessors, because they form a major share of the power budget of equipment. Power consumption in microprocessors can be reduced either by hardware, software or both. The power consumption of the processors increases along with increase in transistor count and clock frequencies. Thus there is a need to design a processor which consumes less power without affecting its performance and efficiency. Likewise, each instruction involves specific part of the microprocessor. Therefore, by selecting the correct instruction it is possible to reduce the power consumed by the processor. This paper presents a survey on hardware and software power reduction techniques used for several microprocessors.

Low power is a key feature for portable electronics systems. Power is a valuable commodity, especially in mobile or embedded environments, and in server farms. The need to focus on low-power design becomes critical to extend battery life in portable and battery operated devices and in processor design where power is the first class constraint on par with performance. The power dissipation of modern processors has been rapidly increasing along with increasing transistor count and clock frequencies. Although the performance of the processors increased today, the power consumed by them are much more.

Figure1 depicts the rise in power densities of the processors. Their power densities and concomitant heat generation are rapidly approaching levels comparable to nuclear reactors [1]. Figure 2 shows the power consumption trend of processors introduced by Intel over the past 15 years. The general trend is for maximum processor power consumption to increase by a factor of a little more than 2X in every four years[2].

The general organization of the paper is as follows, Section 1 provides the power consumption in the processor, Section 2 discusses the methods used to determine power consumption in the processor, Section 3 gives the methods for reducing power consumption, Section 4 depicts the challenges in low-power microprocessor design, and Section 5 concludes the paper.

The main two forms of power consumption in the processor are dynamic power consumption and static power consumption [3], [4].

1.1 Dynamic Power Consumption

Dynamic Power consumption has two sources, a) Switched capacitance and b) Short circuit current. Switched capacitance is the major source of dynamic power consumption. It occurs due to the charging and discharging of capacitors at the outputs of circuits. Shortcircuit current is the next source of dynamic power consumption and it accounts for only 10-15% of the total power consumption. It arises because circuits are composed of transistors having opposite polarity, negative or NMOS and positive or PMOS.

1.2 Static Power Consumption

Static power consumption is mainly due to the short circuit current. The short circuit current occurs when NMOS and PMOS types of transistors are simultaneously switched on. Researchers have not found a way to reduce it without sacrificing performance. Nevertheless, it is only a smaller percentage of total power. Nevertheless, there are other factors which consume power in battery operated portable microprocessor are as follows[5]:

1.2.1 Average Power

The most popular definition is average power consumption when a typical application program running on the microprocessor. '“A/MHz” metrics is used in the discussion of the average power consumption. This metrics may correspond to the battery life in portable equipments.

1.2.2 Maximum Power

The average power consumption drastically changes according to the application software which the processor running. The toggle ratio of data-value in computing program may change more than 30% of average power consumption. The maximum power consumption of microprocessor is when they choose the voltage regulator device or when they design temperature distribution inside the processor.

1.2.3 Standby Power Consumption

Current microprocessors have the standby mode features. After a program issues a standby instruction, the processor stops the clock distribution inside. In the standby mode, power consumption of microprocessors is only leak current. There are many technologies to reduce a leak current, which are applied for low power Microprocessors. It is because standby current is one of the most important values in portable equipment. For example, a cellular phone needs powerful CPU power during communication.

1.2.4 Power Consumption in power off mode

Standby leak current has been increased to the order of mA. Because portable equipment cannot allow mA leak current, another option is to cut power supply into microprocessor. However, in the real system, even the turn off microprocessor should keep the stable value of external pins to prevent leak current in board level. Some functionality in microprocessor like “real time clock feature” must be active even microprocessor's main power supply is terminated. The realistic power consumption in power off mode can be defined according to the system requirement.

1.2.5 Memory Retention Power Consumption

Because power off mode require all initial boot procedure when microprocessors wake up. Some microprocessors support the memory retention power off mode. In this mode, the contents of on-chip memory are kept. The power supply for memory macro should be kept even the main power supply for random logics in microprocessor is shut down.

1.2.6 Sleep Mode Power Consumption

According to the requirement from the system designers, microprocessor chips had better have various variations of operation mode. For example, CPU clock off and interrupt controller active that is waiting interrupt signal from outside. One peripheral IP can be power off when CPU is alive. Therefore the definition of power consumption can be defined by the requirement from the system.

The power consumed by the processors can be measured by the following two methods. The first method is to measure the current drawn by the processor as it repeatedly executes certain instructions. By this method, it is possible to obtain the information needed to evaluate the power cost of the entire program for that processor. To measure the current drawn by the processor, an oscilloscope with a shunt resistor connected in series with the supply voltage pin of the microprocessor can be used. The other method is a direct measure with an ammeter. It should be ensured that the used ammeter can measure high frequency signal in order to obtain a stable measure of the current drawn by the microprocessor.

The second method is based on a simulation of the microprocessor and the effect of the instruction set in the microprocessor. The lower level simulation can provide an estimate of the current drawn to calculate the power consumption of each instruction. Once the modules activated by each instruction have been determined, the power consumption can be calculated with the sum of all the energy of the active modules in a given instruction.

Power consumption in microprocessors can be reduced either by hardware, software or both.

3.1 Hardware methods

Hardware methods can be implemented both in circuit level or in logic level [6].

The following are the circuit level methods for low power design. There are some technologies in circuits level for reducing dynamic power and for reducing leakage power.

3.1.1 Standard Transistor Size Optimization

Both high-performance and low power are required in current microprocessor design. However, the methods for higher frequency mostly increase power consumption. Larger size of transistors used to be preferred in processor design. But, taking into account of both metrics of frequency and power, there is an optimal point of average transistor size in rather small transistor size.

3.1.2 Reduce Junction Capacitance

Not only wiring capacitance, but also junction capacitance is important for power reduction. Cell circuit and layout techniques can reduce the power.

3.1.3 Repeater & Buffer Trees

Instead of usage of large drivability transistors multi-stage repeaters for long-distance signal transfer sometimes reduce power.

3.1.4 Low Voltage Operation

Low voltage operation is the key for low power because dynamic power is roughly in proportion to square of supply voltage. Circuit techniques to allow low power operation are required.

3.1.5 Control of Substrate Voltage

Not only standby leak current, but also leak current in real operation can be reduced by substrate voltage control

3.1.6 Power Switch

Final solution for leak current reduction is power off. The technology of power switch in the chip is important.

The following are the logic level methods for low power design. There are some technologies in logic level for reducing dynamic power and for reducing leakage power.

3.1.7 High-performance in low-frequency

Dynamic power is in proportion to operation frequency. If the same application can be achieved in lower frequency, dynamic power is reduced. Architectural improvement like superscalar and wide bit width is one method to reduce dynamic power. The dynamic power consumption is given by the formula,

a is activity factor,
C is switched capacitance,
V is supply voltage,
f is clock frequency.

From the above equation it can be understood that dynamic power consumption can be reduced in four ways. They are i) reducing the physical capacitance or stored electrical charge of a circuit, ii) reducing the switching activities, iii) reducing the clock frequency and iv) reducing the supply voltage.

3.1.8 Instruction Set Architecture

Instruction set can be designed for high frequency and for low power. Reducing the switching ratio is target of this design.

3.1.9 Gated Clock

Accuracy at the gate level and good correlation to hardware are key factors for a reliable power reduction methodology. Clock power is a major component of microprocessor power mainly because the clock is fed to most of the circuit blocks in the processor, and the clock switches in every cycle. Clock gating is an effective way of reducing power dissipation in digital circuits. In generalpurpose microprocessor, only a portion of the circuit is active at any given time. The system-on-a-chip (SOC) design has become popular. A typical SOC design consists of one or more central processing units (CPUs), Random Access Memory (RAM) banks, bus interface units, input–output and memory controllers, floating point coprocessor, etc. By shutting down the idle units, it is possible to prevent the circuit from consuming unnecessary power. In addition, we can shut down a portion of the clock tree by masking off the clock at the internal node of the tree using an AND gate. This prevents wasteful switching in the clock tree and saves power in the clock tree in addition to saving power in the functional units, which are fed by the clock. The power saving in the clock tree and the modules is large enough to compensate the additional power consumption due to the control logic and the enable signal routing [7][8]. Deterministic clock-gating (DCG) technique effectively reduces clock power. DCG is based on the key observation that for many of the pipelined stages of a modern processor, the circuit block usage in the near future is known a few cycles ahead of time[9].

3.1.10 Reduce Memory Access

One of the major power consumption in microprocessor is power in memory macro. Introducing of multiple numbers of small memories and reducing unnecessary memory activation can reduce the power. The following techniques used to reduce memory access related power consumption by reducing the number of data transfers between processor and memory, or between a higher level of memory and a memory at a lower level using source program transformation[10]. The procedure involves profiling, inlining, and global transformation [11]. In Profiling, the number of data array reads and writes for various parts of the program is first obtained. From the profile, Data array access and the call path of these arrays are identified and these are used for further transformation. Function inlining replaces a call to a function or subroutine. The aim of inlining is to enlarge the exploration space for optimization. At the end of this step, an inlined version of the source code obtained. Global transformation results in improvements in memory access reduction. These transformations are applied to the inlined version of the source code. This aims to obtain a better data locality and increase the exploitation of reuse, which reduces the requirement of more power.

3.1.11 On Chip Memory, On Chip Integration

Current microprocessor operates with low voltage like 1.2 V or 1.O V. However, external interfaces from chip are still using 3.3V or 2.5 V for board operation. To reduce the IO power is one of the key ideas. The simplest method is to integrate parts inside chip, which can remove minimum 10 pins.

3.1.12 Optimization of Sleep Mode

The sleep or standby or power off mode can be designed according to the system scene. Optimization of sleep mode can reduce the power which user really wants. In the sleep mode processor will not do any process, does not consume any power. In sleep mode wastage of power is very much reduced.

3.1.13 Dynamic Control of Supply Voltage

The low supply voltage is key for low power because power consumption is in proportion to square of supply voltage.

Dynamic control of supply voltage and operation frequency according to the lower request of CPU power can reduce power drastically. Dynamic voltage scaling is a power management technique, where the voltage used in a component is increased or decreased, depending upon circumstances. Dynamic voltage scaling to increase voltage is known as overvolting; dynamic voltage scaling to decrease voltage is known as undervolting. Undervolting is done in order to conserve power, whereas overvolting is done in order to increase computer performance, or in rare cases, to increase reliability.

3.1.14 Logical Partitioning for Power Switching and Clock Stop

Logical partitioning is the process of dividing the entire processor into number of modules. This partition is done logically and each having separate functions. Clock stop for a module and power off a module can reduce the power. Logic hierarchy that accepts power off module by module, should be designed carefully because some key function in the module is needed from the view of system which using microprocessor.

3.2 Software Methods

The reduction of power consumption due to the software running on a microprocessor can be achieved in different levels: compiler, low-level, or high-level language.

At compiler level, there is an approach for compiler directed dynamic placement of instructions into a lowpower code cache[12]. This method showed that applying dynamic placement techniques, energy savings can be achieved. The compiler optimizations have memory power consumption [13]. Since memory is one of the main sources of power dissipation. A compilation technique for lower power consumption based on the energy consumed by the instruction register (IR) and register file decoder[14]. In15, analysis of different techniques to design power-efficient compilers, are made and presented a set of software optimization techniques for compilers code generation. A technique to reduce leakage power in L1 instruction caches of high performance microprocessors are by eliminating basic blocks from the cache, as soon as they are dead [16]. This identification of basic blocks is done by the compiler from the control flow graph of the program. This mechanism yields an average of about 5% to 16% reduction, in the energy consumed for different sizes of I-cache.

In terms of low-level language, An instruction-level method was designed by analyzing the problem of power consumption from software running on a microprocessor, allowing the examination of the current and power generated by each instruction on the processor [17]. In[18], analysis of the assembly code generated by an optimizing compiler assigning a power cost for each instruction was made and concluded that minimizing execution time, the power consumption in the system will be reduced.

In High-Level Language Optimization techniques, the following loop oriented techniques can be used: loop unswitching, loop peeling, scalar expansion, loop fusion, loop alignment, loop fission, nested loops, loop reversal, loop unrolling, and loop interchanging [19] [20]. The following are the other optimization techniques.

3.2.1 Instruction Reordering

The first technique is to reorder the instruction in order to reduce switching. The previous instruction executed determines the energy consumed during the execution of particular instruction. Thus, a low energy can be achieved by an appropriate reordering of instructions in a program. In certain DSP processor, this technique can achieve a reduction in the power consumption between 30% and 65% [21] [22] .

3.2.2 Cycle-to-cycle Minimization

There are two main approaches to RT level power reduction[23]. The first is to minimize the power required to implement a function, and second, to minimize how often the function needs to be executed. A new method of reducing AC power was created to minimize cycle-tocycle unnecessary toggles. In this method, next cycle control signals are derived as a function of next cycle function, along with the previous cycle state. By minimizing cycle - to - cycle functional toggle requirements, power can be saved independent of function implementation. The idea behind this is a simple one, if a functional path is not required on the current cycle, all the control paths and data paths remain at the previous cycle's state, eliminating all node toggles.

3.2.3 Pseudo-Microcode

The two common decoding techniques are distributed decode, and microcode[23]. In distributed decode each logic control signal is derived as an independent cone of logic and in Microdecode each opcode is translated to an entry point address to a Read Only memory (ROM) or Programmable Logic Array (PLA) that provides the complete set of control signals. The simplicity of microcode and the flexibility of distributed decode are combined and is called pseudo-microcode. Figure 3 gives the structure of the pseudo microcode. For every opcode, all control signals that can be set by that opcode are bundled together as a total identity. These include the pipeline control, the address generation control, and the execute control bits. The microcodes are decoded in each microcode unit for the instruction type. Once the instruction type is determined, then the value is set for every control signal within that microcode unit. An advantage to this type of microcode is the ability to embed logic within the microcode, such as conditional checks within an opcode type. This can reduce the number of microcode addresses which may reduce the overall size of the decode unit, saving power.

3.2.4 Code Generation

The next method is Code Generation through Pattern Matching[24]. This technique modifies the number of execution cycles for obtaining a code generator that targets energy reduction. The resulting code was similar to the code generated when targeting cycles. The average power times the number of clock cycles gives the energy cost of instruction pattern.

3.2.5 Reduction in Hamming distance.

The following method applied for reducing the power consumption of a pipelined microprocessor system arranged to run a program stored in a memory. The method comprises duplicating at least one branch instruction so as to reduce the number of transitions on the bus between the microprocessor and the memory when the program is executed. This method aims at improving the processor's average inter instruction Hamming distance. The Hamming distance between two binary numbers is the number of bits that differ between them. In Table 1, Hamming distance between 5 and 6, 0 and 15 are calculated. The four bit binary equivalent of 5 is 0101, 6 is 0110, 0 is 0000 and 15 is 1111. In 5 and 6, the first two (01) bits are same. But the third and fourth bits are changed from 01 to 10. Since, two bits are changed the Hamming Distance 2. In 0 and 15, all the four bits are changed from 0000 to 1111 so its Hamming Distance is 4. Hamming distance can be related to power, in a way that binary numbers are represented by electrical signals. A steady low voltage on a wire represents a binary 0 bit and a steady high voltage represents a binary 1 bit. A number will be represented using these voltage levels on group of wires called bus. Power is used when the voltage on the wire is changed. The amount of power depends on the magnitude of the voltage change and the capacitance of the wire. The capacitance depends on physical dimension of the wire. So when the number represented by bus changes, the power consumed depends on the number of bits changed that is the Hamming Distance. Therefore the reduction in the average Hamming distance between successive values on a high capacitance bus, keeping all other aspects of the system same, the low power consumption can be achieved.

The goal of low power design is to reduce the wasted power by powering down blocks that are not needed when the CPU is idle and by switching only as much capacitance as needed when the CPU is active [25][26]. Knowledge of the architecture is important before applying the power reduction techniques, because not every technique can be used in every processor. The challenges of low-power microprocessor design are power should be saved with little or no performance and area impact. When power reduction is made it should not affect the performance of the processor. If it is so there is no use of reducing the power. In battery operated portable devices the area is the main constraint, so the designed low power microprocessor should not violate area constraint While most of the existing low-power design techniques are applicable, their choice and judicious usage is perhaps the most important. When the technology moved from 180 nm to 130 nm and 90nm, the challenge to be faced by the processor design is the consideration of leakage or static power because it will be equal to dynamic power[26]. Therefore special attention is needed for design of microprocessor for low power consumption in this new technology.

The reduction in the power consumption is an important issue in designing a microprocessor since nowadays they find wide application in battery operated portable devices. In this paper, various forms of power consumption in the processor are discussed. Along with that, number of hardware and software techniques to reduce the power consumption in the processor is surveyed. The challenges for designing a processor for low power consumption are examined which depicts that while implementing these techniques it should be ensured they are not affecting the performance, reliability and efficiency of the processors.

[1]. Venkatachalam, V. and Franz, M. (2005). “Power Reduction Techniques For Microprocessor Systems”, ACM Computing Surveys, Vol. 37, No. 3, September, pp. 195–237.

[2]. Stephen H Gunther, Frank Binns, Douglas M. Carmean, Jonathan C. Hall, “Managing the Impact of Increasing Microprocessor Power Consumption”, Desktop Platforms Group, Intel Corp.

[4] Hattori, T. (2003). “Design Methodology of Low-Power Microprocessors”, Design Automation Conference, Proceedings of the ASR – DAC, Asia and South Pacific. Hitachi, Ltd. PP, 390-393

[7]. Oh, J., & Pedram, M. (2001). “Gated Clock Routing for Low-Power Microprocessor Design”, IEEE Transactions On Computer-Aided Design Of Integrated Circuits And Systems, June, Vol. 20, No. 6, pp.715-722.

[8]. Oh J., & Pedram M., (1998). “Power reduction in microprocessor chips by gated clock routing,” in Proc. Asia South Pacific Design Automation Conf., Jan. 1998, pp. 313–318.

[9]. Sulaiman, D R. (2008). “Using Clock gating Technique for Energy Reduction in Portable Computers”, IEEE C, Computer and Communication Engineering, pp 839- 842.

[10]. Hai Li, Bhunia, S., Yiran Chen, Kaushik Roy, Fellow, IEEE, & Vijaykumar T. N., (2004). “DCG: Deterministic Clock-Gating for Low-Power Microprocessor Design”, IEEE Transactions On Very Large Scale Integration (VLSI) Systems, March, Vol. 12, No. 3, pp 245-254.

[11]. Shan Li, Edmund M K Lai, Absar, M J. (2003). “Minimizing Embedded Software Power Consumption through Reduction of Data Memory Access”, IEEE conf. pp 309-313.

[12]. Ravindran R A., Nagarkar P D., Dasika G S., Marsman E D., Senger R M., Mahlke S A., & Brown R B., (2005) “Compiler managed dynamic instruction placement in a low-power code cache,” International Symposium on Code Generation and Optimization, pp 179 – 190.

[13]. Zambreno J., Kandemir M T., & Choudhary A., (2002). “Enhancing compiler techniques for memory energy optimizations,” Embedded Software. Second International Conference, EMSOFT 2002, 2491:364 – 381.

[14]. Mehta H., Owens R., Irwin M., Chen R., & Ghosh D., (1997). “Techniques for low energy software,” ISLPED - International Symposium on Low Power Electronics and Design, pp 72 – 75.

[16]. Mohan G. Kabadi, Kannan, N., Chidambaram, P., Suriya Narayanan, Subramanian M., & Ranjani, P., “Dead-Block Elimination in Cache: A Mechanism to Reduce I-cache Power Consumption in High Performance Microprocessors”, School of Computer Science and Engineering, Anna University.

[17]. Tiwari V., Malik S., Wolfe A., (1994). “Power analysis of embedded software: a first step towards software power minimization,” Proceedings of the IEEE Conf. on Computer Aided Design, Santa Clara CA Nov, pp. 384- 390.

[18]. Russell J T., & Jacome M F., (1998). “Software power estimation and optimization for high performance, 32-bit embedded processors,” International Conference on Computer Design: VLSI in Computers and Processors, pp 328 – 333.

[21]. Sarta, D., Trifone, D., Ascia, G. (1999). “A data dependent approach to instruction level power estimation”. Low-Power Design,1999. Proceedings. IEEE Alessandro Volta Memorial Workshop on, pp (s): 182 -190

[22]. Wiratunga, S., Gebotys C. (2000). “Methodology for minimizing power with DSP code”. Electrical and Computer Engineering, 2000 Canadian Conference on, pp. 293 -296 vol.1.

[24]. Tiwari, V., Malik, S., Wolfe, A. (1994), “Compilation techniques for low energy: an overview”. Low Power Electronics,1994. Digest of Technical Papers., IEEE Symposium, pp: 38 –39.

[26]. Suresh, R. (I996). “Challenges in Low-Power Microprocessor Design”, Microprocessor Technology, Intel Corporation, 9th International Conf. on VLSI Design, Jan pp 329-330.

[27]. Lichtenau, C., (2005). Alberto Garcia Oruz, Pfliiger, T. “Technological and Architectural Power Optimizations for Advance Microprocessors” IEEE conf. pp.11-14.

A Survey On Hardware And Software Optimization Of Microprocessors For Low Power Consumption