VLSI Modeling of Low Power Razor Based Programmable Truncated Multiply and Accumulator for Efficient Digital Signal Processing

M. Poornima * P. Lokesh ** T. Muni Reddy ***

*,*** Associate Professor, VEMU Institute of Technology, Chittoor, AndhraPradesh, India.

** Assistant Professor, VEMU Institute of Technology, Chittoor, AndhraPradesh, India.

Abstract

In this paper the power consumption is greatly is minimized by using efficient fault Tolerant Techniques. In a VLSI architecture one has to integrate millions of circuits on printed circuit board, at the same time the designer faces difficulty to deal with the Faulty ICs incorporated in the design. The faulty chips found in the circuit that needs to be replaced earlier may lead to the damage of the circuit entirely. For this the effective fault tolerant techniques are proposed. Some of them are Design for testability, Built in Self Test, Formal verification, and Functional Verification. The VLSI is a trade-off between Design Engineer and Test Engineer. Design Engineer focuses on efficient integration of millions of transistors on a PCB whereas the role of test engineer is to find fault in the circuit. In existing architecture independent programmable truncated multipliers are used to verify fault circuits and to achieve low power consumption benefits at the output Signal to Noise Ratio. But the method suffers with delay degradation factor. In this brief, the authors use programmable Truncated multiplier within the Digital Signal Processing architecture. With this the supply voltage is minimized. This fault technique improves the performance of fault designs, and reduces error correction burden. The Simulation and Synthesis results are verified on Xilinx14.3 design suite tool with Virtex6 FPGA prototyping Hardware Environment. The design summary results show that the proposed architecture consumes less power when compared to the previous architectures.

Keywords :

Digital Signal Processing (DSP),
Low Power,
Razor,
Reconfigurable Multiplier,
Truncated Multiplication.

Introduction

Voltage scaling provides an important role to lower power consumption in VLSI circuits, because scaling the supply voltage by a factor of K results in reductions in the dominating dynamic power consumption by a factor of K2 and yields static power benefits. However, progresses in CMOS innovation scaling added to an exponential development of configuration issues obtained from Process Voltage Temperature (PVT) varieties, regularly bring about moderate outlines that prompt a powerful utilization. A portion of the exemplary outline timing imperatives can be casual in Digital Signal Processing (DSP) frameworks by applying unusual Voltage Over Scaling (VOS) levels to additionally enhance vitality utilization levels while keeping up flag handling execution.

Two of the main streams for providing error-resiliency against timing violations are:

Techniques that introduce an estimation or prediction subsystem that monitors the system output and provides an approximation if a fault is detected.
Techniques that modify the data capture by augmenting the latches or flip-flops on the critical path and allotting extra execution time for operations that need a long execution time.

Advantages

Acceptable circuit performance.
As the Truncated multipliers are smaller than full precision ones, they achieve improvements in power consumption, area and result in different timing distributions.
The existence of synergic benefits derived from the combination of truncated multiplication and voltage scaling using a fault tolerance strategy is present in this paper where both techniques are applied to a custom designed fixed point Multiply and Accumulate (MAC) structure.

Disadvantages

Signal degradation.
Execution time penalties.

Power savings obtained by fault tolerant techniques are dependent on both PVT variations and the circuit physical design, but are also influenced by the data input to the circuit, as the statistical timing distribution defines the percentage of samples estimated and/or corrected, thus conditioning the maximum power savings obtainable using such techniques.

Remedy

Using Truncated multiplication as a means of achieving both power and area improvements in the field of arithmetic circuit design, at the expense of signal degradation.

1. Existing System

1.1 Voltage Scaling Beyond Vdd−crit

Power consumption is the important concern in many Arithmetic Logic Units, meanwhile the energy consumed by a digital gate is denoted as

(1)

whereas α0 →1 is defined as the average quantity of times in each clock cycle (at a frequency fclk) that a node with capacitance cl makes a energy ingesting transition and lowering the deliver voltage with the aid of an issue of good effects in a quadratic improvement within the power intake of CMOS common sense. If we reduce the supply voltage by a factor of “K”, it results in the improvement of power consumption rate in CMOS Logic [8]. The connection between the circuit delay (τ_d) and the deliver voltage v_dd is given through,

(2)

where cl is the load capacitance,β is the gate transconductance, v_t is the tool threshold voltage, and α is the velocity saturation index. The crucial supply voltage of a given structure vdd−crit, is reffered as the minimal deliver voltage where timing on the critical path is met for any expected pvt versions.

Scaling the supply voltage to v_dd = okay vdd−crit, wherein 0 < k < 1 is referred to as VOS; despite the fact that this method effects in addition power discounts almost proportional to k2, scaling v_dd under the crucial supply voltage results in essential timing failures for positive input combinations under sure pvt conditions.

1.2 Razor and Fault Tolerance for Timing

The razor approach is a technique to apply dynamic voltage scaling by using dynamic detection and correction of circuit timing errors. Through measuring the error price in the circuit, the deliver voltage may be tuned even as the circuit is in operation, easing the requirements imposed by using conservative timing evaluation. Implementation problems of razor together with its required hardware overhead [5], wherein razor ii and bubble razor were introduced and tested inside a complete system with reduced location and timing overheads, and razor is applied to an excessive-speed actual-time finite-impulse reaction (FIR) filter. The efficiency of razor, and the limits regarding vdd scaling depends upon the circuit timing distribution. Therefore, for any circuit imposing razor, reducing the amount of time required to perform the average and slowest operation will considerably enhance razor merits. This is the incentive for thinking about the truncated multiplier, which exhibits a timing profile distinctive from the usual multiplier. The razor approach is a technique to apply dynamic voltage scaling by using dynamic detection and correction of circuit timing errors. Through measuring the error price in the circuit, the deliver voltage may be tuned even as the circuit is in operation, easing the requirements imposed by using conservative timing evaluation.

1.3 Truncated Multiplication

Truncated multipliers allow power, area, and timing improvements by not considering the implementation of sections of the least significant part of the partial product matrix. Without calculating the full-precision output, the output is that from the sum of the first (N + h) columns (where 0 ≤ h ≤ N), where N is the operand width, plus an estimation of the erased bits. Truncation permits the way of reducing the complexness of the number unit by replacing the lower components of the partial product matrix by a smaller compensation circuit, and its variants vary from terribly sharply truncated applications to reliably rounded truncated multipliers [2], [9].

Programmable and configurable approaches to truncated multiplication use fixed-width structures which will be operated at reduced resolutions by disabling components of the partial product generation [3], [6]. The introduction of programmable truncation in a fixed-point multiplier allows changes not only in decreasing power, but also in increasing the performance of the system [1]. PTM also modifies the Original Critical Path (OCP) of such arithmetic block, making the architecture virtually faster where the Active Critical Path (ACP) τACP < τOCP. These special considerations make the VLSI design react with high performance and helps to achieve low power consumption and thus PTM finally tradeoff the Area, Delay, and power limits as these are crucial for any VLSI modeling.

2. Proposed System

2.1 PTMAC- A Flexible Low Power DSP with PTM

To enlarge the usage of PTM to general DSP architectures, the authors propose programmable Truncated Multiplier and Accumulator (PTMAC). PTMAC, designed as a technique to exercise PTM in low-power medical specialty applications with a necessity for modest DSP like graph filtering or fault detection.

The proposed DSP, as depicted in Figure 1, includes a control unit operating in a five-stage pipeline, program and memory blocks in a multibus Harvard configuration, some I/O connectivity and an arithmetic unit consisting of a MAC structure with a 16-bit PTM, a 40-bit accumulator, and a 40- bit barrel shifter for scaling and rotating the accumulated value. The total gate count of the original PTMAC chip is 48 k, and it is estimated (postsynthesis) to have a maximum power consumption of 79.46 μW/MHz.

Figure 1. PTMAC Top Level Diagram

It is noticed from the Timing analysis of the proposed PTMAC design that the crucial path is found inside the structure of the arithmetic unit; thus, energy savings derived from the appliance of voltage scaling approaches are unnatural by the signal propagation time through the arithmetic unit. An experimental approach to mix the delay-modulation capabilities of programmable truncation and therefore the advantages of fault tolerance is explored within the following sections as some way to realize a versatile unit that trades energy for signal and performance degradation.

Drawbacks

Signal and performance degradation.

2.2 Razor-based PTMAC, Low Power DSP via Delay Modulation

The combination of a PTM and a fault tolerant system permits such a system to modulate the common and most delay times within the m_ack unit at run time. Therefore, the amount of errors that require correction at any V_dd level are often cut down by reducing the multiplier factor accuracy.

To explore the independent benefits and interactions between fault tolerance and truncated multiplication, Razor PTMAC was designed as an evolution of PTMAC.

2.3 Razor Implementation

To achieve the fault tolerance, the accumulator unit of the PTMAC was replaced by a fault tolerant version named Razor Accumulator where the original flip-flops were substituted by a version of the Razor registers [4]. The argumented cells were designed and hold on as library cells for post synthesis insertion. Such a cell follows the first implementation. In Razor implementation, replaces the shadow latch at intervals and the Razor registers with a shadow-flip-flop to avoid synthesis problems. The meta stability detector needed in Razor implementations was modelled because the delay of associate degree electrical converter more as a constraint to the hold time of the Razor accumulator. During this approach, all temporal arrangement violations probably inflicting meta stability area unit are then detected as temporal arrangement errors, providing a boundary for the performance of Razor. Figure 2 shows the executions of five instructions in the Razor PTMAC pipeline.

Figure 2. Execution of Five Instructions in the Razor PTMAC Pipeline. With the four stages of the Razor Error Detection- Correction, Rounding Plus Final Addition

Static timing analysis of PTMAC demonstrated that the only registers situated at potentially critical paths within PTMAC were located in the accumulator, as the multiplication and accumulation of the input data is performed within a clock cycle. Therefore, flip flops capturing the 10 most significant bits of the accumulator were replaced by Razor flip-flops. Insertion of the Razor flip-flops and the associated control logic resulted in an increase of 18% of the core area.

Since the hold constraint only limits the maximum duration of the positive clock phase and does not affect the clock frequency, a single clock was utilized to drive both main and razor flip-flops with both transition edges providing flexibility to configure the extra time allowed by the shadow registers by configuring the duty cycle of the clock.

3. Simulation and Synthesis Results

The paper was solved by using the advanced simulation and synthesis tool like Xilinx14.3 design suite and the targeted hardware component is Virtex6 FPGA. The top module of PTMAC is shown in Figure 3, which was designed for 8*8 multiplication.

Figure 3. PTMAC Top Module

The simulation waveforms in binary form are shown in Figure 4 and the unsigned form is shown in Figure 5. The Razor register top module is illustrated in Figure 6.

Figure 4. PTMAC Results

Figure 5. Simulation Waveforms Truncated Results

Figure 6. Razor Register Top Module

The design summary results to the target hardware FPGA Virtex-6 shown in Figure 7 represents the number of hardware components required to implement the design before modelling on ASIC. In this paper the authors have observed the advantages and limitations of existing methodologies by taking the references from [7] and the obtained results are presented in Table 1.

Figure 7. Design Summary Report

Table 1. Comparison of Results

The power analysis is calculated by using Xilinx power analyzer tool and is shown in Figure 8.

Figure 8. On Chip Power Consumption

Conclusion

In this paper for efficient digital signal processing the Razor based truncated multiplier is used for finding faults in the circuit. The delay and power effects of voltage scaling with error correction and the application of programmable truncated multiplier reflects the efficient power consumptions. The use of this truncated multipliers results in 1) reducing power at the multiplier by means of cancelling the switching pastime and 2) disabling the multiplier important direction, for this reason reducing the error healing overheads of razor, and extending the relevant v_dd variety. Results mentioned in Table 1 show that the results of each techniques to the proposed DSP unit enable most energy savings of 24.8%, rising the results obtained by implementing programmable truncation, fault tolerance via Razor, and also the most optimistic prediction for the mixture of each techniques (24.4%). This means that the delay-modulation properties of truncated multiplication may be exploited to enhance the energy consumption of fault tolerant DSP architectures wherever multipliers area unit are concerned within the vital path of the circuit.

References

[1]. Chandrakasan, A. P., Potkonjak, M., Mehra, R., Rabaey, J., & Brodersen, R. W. (1995). Optimizing power using transformations. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 14(1), 12-31.

[2]. De la Guia Solaz, M., Bourke, A., Conway, R., Nelson, J., & ÓLaighin, G. (2010, August). Real-time low-energy fall detection algorithm with a Programmable Truncated MAC. In Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE (pp. 2423-2426). IEEE.

[3]. Drane, T. A., Rose, T. M., & Constantinides, G. A. (2014). On the systematic creation of faithfully rounded truncated multipliers and arrays. IEEE Transactions on Computers, 63(10), 2513-2525.

[4]. Ernst, D., Das, S., Lee, S., Blaauw, D., Austin, T., Mudge, T., et al. (2004). Razor: Circuit-level correction of timing errors for low-power operation. IEEE Micro, 24(6), 10-20.

[5]. Fojtik, M., Fick, D., Kim, Y., Pinckney, N., Harris, D., Blaauw, D., et al. (2012, February). Bubble Razor: An architecture-independent approach to timing-error detection and correction. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International (pp. 488-490). IEEE.

[6]. Hsiao, S. F., Jian, J. H. Z., & Chen, M. C. (2013). Low-cost FIR filter designs based on faithfully rounded truncated multiple constant multiplication/accumulation. IEEE Transactions on Circuits and Systems II: Express Briefs, 60(5), 287-291.

[7]. Prakash, J. K., & Hemalatha.M. (2015). The Effective Programmable Razor Based Truncated MAC for Digital Signal Processing. International Journal of Emerging Trends in Engineering Research (IJETER), 3(6), 292-296.

[8]. Sakurai, T., & Newton, A. R. (1990). Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas. IEEE Journal of Solid-state Circuits, 25(2), 584-594.

[9]. Wang, J. P., Kuang, S. R., & Chuang, Y. C. (2006, June). Design of reconfigurable low-power pipelined array multiplier. In Communications, Circuits and Systems Proceedings, 2006 International Conference on (Vol. 4, pp. 2277-2281). IEEE.