Anju M. I. * J. Mohan ** Beena M. I. ***

* Department of Electronics and Communication Engineering, New Prince College of Engineering and Technology, Chennai, Tamil Nadu, India.

** Department of Electronics and Communication Engineering, SRM Valliammai College of Engineering, Chennai, Tamil Nadu, India.

*** Department of Mathematics, Ezuthachan College of Pharmaceutical Sciences, Thiruvananthapuram, Kerala, India.

Abstract

Electromyography (EMG) is a diagnostic technique for determining the health status of the muscles and the motor neurons that regulate them. It transmits electrical signals that cause muscles to contract and relax. An EMG translates these signals into graphs or numbers and help the doctors to make diagnosis. The electric signal transmitted by the muscles will have lot of noises. Fast Fourier Transform is one of the most used algorithms for calculating the Discrete Fourier Transform (DFT). It's because of its reduction in computing time and better efficiency. A radix-2 FFT allows to analyze dynamic signals coming through muscle contractions efficiently and same can be precisely implemented using radix-4 Coordinate Rotation Digital Computer (CORDIC) algorithm. Here it deals with the calculation of muscle fatigue using EMG and proposes an ideal FFT core implementation using CORDIC algorithm as a solution for the same.

In the diagnostic procedure of Electromyography, the health status of the muscles and the motor neurons that regulate them is assessed. The motor neurons transmit electrical signals, that contract and relax muscles. An EMG translates these signals into numbers or graphs to help doctors make a diagnosis (Ali, Albarahany, & Quan, 2012).

Non-invasive electrodes and invasive electrodes record both surface EMG and intramuscular EMG signals. These days, surface detected signals are preferably used to obtain information on the time or intensity of surface activation of the muscle. Electromyography (EMG) signals are known to be most useful in both medical and engineering fields as electrophysiological signals (Nazmi, Rahman, Mazlan, Zamzuri, & Mizukawa, 2015). By recording EMG signals, the basic method for understanding the behaviors of the human body under normal and pathological conditions is provided.

Whenever an EMG signal is obtained from the muscle, it is polluted by different types of noise. It is therefore very difficult to analyze and identify the EMG signals due to the complicated pattern of the EMG, especially when EMG movement occurs. To solve this problem, the signal must be processed using DFT before being sent to storage.

Fast Fourier Transform can be used for signal analysis in the surface electromyography. A radix-2 FFT allows to analyze dynamic signals coming through muscle contractions efficiently and same can be precisely implemented using radix-4 CORDIC algorithm.

Due to the pattern of the EMG, particularly when EMG movement occurs, it is therefore very difficult to analyze and classify the EMG signals. Fast Fourier Transform is the most efficient method for the same to solve this problem, the signal must be processed using DFT before being sent to space. Liu et al. (2011) explains an efficient one-way commutator-feedback (SDC-SDF) radix-2 pipeline fast Fourier transform architecture that includes log2 N-1 SDC stages and 1 SDF stage. However better efficiency achieved in higher frequency than in lower frequency. In lower frequency hardware utilization is lesser than the proposed.

In Wold and Despain (1984) special attention is given in proposition for implementing high performance FFT using VLSI. In view of these constraints, VLSI implementations have constraints that vary from those of discrete implementations, prompting another look at some of the standard FFT's algorithms. The parallel pipe lined architecture of the processor also has higher and it has throughput with lowered power consumption (Kannan & Srivatsa, 2007).

In Han, Erdogan, Arslan, and Hasan (2005) importance is given for low power consumption, which can be gained by combining hybrid algorithms for low power and architectures. However improvement is needed in the comparative study conducted for different FFT processor in ideal conditions and the real time results varies.

Zhou, Peng, and Hwang (2009) emphasizes that an efficient mapping of the Single-path Delay Feedback (SDF) pipeline and Fast Fourier Transform (FFT) architecture to Field-Programmable Gate Arrays (FPGAs) is proposed. Improvement is required as there is no significant explanation on architectural or algorithmic choices for increasing the throughput.

Rani, Sarma and Prasad (2012) have discussed the Radix- 2 FFT algorithm by considering the calculation of the N = 2^v point DFT by the divide-and conquer approach. The authors have split the N-point data sequence into two N/2- point data sequences f₁ (n) and f₂ (n), corresponding to the even-numbered and odd-numbered samples of x(n), respectively, that is,

Thus f₁(n) and f₂(n) are obtained by decimating x(n) by a factor of 2, and hence the resulting FFT algorithm is called a decimation-in-time algorithm. Now the N-point DFT can be expressed in terms of the DFT's of the decimated sequences as follows:

where F₁(k) and F₂(k) are the N/2-point DFTs of the sequences f₁(m) and f₂(m), respectively. Since F₁(k) and F₂(k) are periodic, with period N/2, F₁(k+N/2) = F₁(k) and F₂(k+N/2)=F₂(k).

Here it observes that the direct computation of F₁(k) requires (N/2)² complex multiplications. The same applies to the computation of F₂(k). Furthermore, there are N/2 additional complex multiplications required to compute WNkF2(k). Hence the computation of X(k) requires 2(N/2)² + N/2 = N²/2 + N/2 complex multiplications. This first step results in a reduction of the number of multiplications from N² to N²/2 + N/2, which is about a factor of 2 for N large. The decimation of the data sequence can be repeated and again until the resulting sequences are reduced to onev point sequences. For N = 2^v, this decimation can be performed v = log₂N times. Thus the total number of complex multiplications is reduced to (N/2)log₂N. The number of complex additions is Nlog N (Rani, Sarma, & 2 Prasad, 2012). As shown in Figure 1, the Architecture implemented is Radix-2 Single path Delay Feedback (RSSDF). By storing the butterfly output in feedback shift registers, this architecture uses the registers more effectively. At each point, as shown in Figure 2, a single data stream passes through the multiplier. It has the same number of butterfly units and multipliers as in the R2MDC method, but the memory limit is much lower: N-1 registers (He & Torkelson, 1996) FFT can be decomposed using a first half/second-half approach that divides the output sequence X(r) into increasingly smaller subsequence; this procedure is called decimation-in-frequency (DIF) FFT.

The butterfly starts storing the data into FIFO when valid in is high. Butterfly has an internal control logic which consists of three states. In first stage it receives real and imaginary data and stores it into FIFO till it is full. Once the buffer is full it will go to next state. In this state it reads data from FIFO as well as receives input data, processes these two inputs and generates two outputs. In second state the addition and subtraction of inputs will be performed, subtraction results of inputs are stored in FIFO and addition results are sent to next stage. Depending on stage after receiving entire input data state is incremented to next state i.e. third state. In this state the subtraction results stored in the FIFO in previous state are sent to next state.

If valid in is high the received data is stored in the FIFO again. The output data bits i.e. Data out (Real and Imaginary values) are connected to the inputs of the next stages and the valid out output signal acts as valid in input to the next stage. This is how all the 10 stages are connected to form a complete 1024 Point FFT. This Architecture is entirely scalable to tune to any size FFT. If the 1^st stage is removed then it becomes a 512 point FFT. And if 2^nd stage is removed it becomes a256 point FFT.

CORDIC (COordinate Rotation DIgital Computer), also known as Volder's algorithm, is a simple and efficient algorithm for calculating hyperbolic and trigonometric functions, usually converging by iteration to one digit (or bit) (Tang, Yu, Han, & Zhang, 2016). CORDIC is a bit by bit algorithm. CORDIC is used when no hardware multiplier is available (e.g. in simple microcontrollers and FPGAs). It requires only basic shift and add operations.

It calculates the trigonometric functions of sine, cosine, magnitude and phase (arctangent) to any desired precision CORDIC's idea is to "rotate" the phase of a complex number by multiplying it by a succession of constant values. (CORDIC FAQ, n.d).

However, the multipliers can all be powers of 2, so they can be done with just shifts and adds in binary arithmetic; no actual multiplier is required. There are two inputs real input and imaginary input. The inputs are stored in an array. The 8-point FFT is generated using series of adder and multiplier. CORDIC is used in the multiplier section where the authors use stage level operation for the same.

Step 1: If valid = '1'; start receiving data and store in FIFO until it is full. Then check for FIFO. If FIFO='1', go to step 2.

Step 2: Read and process data from FIFO. Then the output and subtraction result stored into FIFO. The addition result send out as output.

Step 4: Read data from FIFO then multiply with twiddle factor and send data to output.

4.1 FFT 8 Simulation Results

The simulations results for 8 point, FFT, are discussed in the following sections.

Figure 3 shows that, when valid in is kept high and real in and image in inputs are given. As a result valid out goes high and gives real data out and image data out.

Figure 4 shows the MATLAB result obtained, which is used for comparison with the verilog result to ensure the output matches.

Figure 5 shows the Verilog output for 8 point FFT matching with 8 point. MATLAB simulation output for FFT. Hence confirming the output obtained.

Figure 6 shows the Stage 1 output comparison between Verilog and MATLAB hence confirming the output.

Figure 7 shows the stage 2 output comparison between Verilog and MATLAB to ensure the output matching.

4.3 MATLAB Simulation Output for FFT

The result is verified for 8 point FFT. As it works for 8 point FFT it also works for 16, 32, 64…1024 because it is the basic building block of this architecture and it remains same for all stages.

In order to verify for 1024 point, here it generated 2 frequencies i.e. 50 Hz and 120 Hz, added up the two signals. It now acts as the input to 1024 point FFT. In Frequency domain view, the expected plot output for FFT should get the peaks at 50 Hz and 120 Hz on the graph. Here it plotted both MATLAB and Verilog output plot and can conclude that the output matches. There is a loss of precision and noise in Verilog output because here it uses 16 point decimal format as compared to MATLAB which uses 64 bit IEEE floating point format, hence accuracy get reduced. It is clear that the peaks at the correct frequency confirming the working of the coding done.

The Figure 8 shows the simulation results of 1024, Figure 9 shows the input signal plot and Figure 10 shows the Input signal generated from MATLAB given as input signal to Xilinx.

When valid in is kept high and real in and image in inputs are given. Valid out goes high and get real data out and image data out.

The biggest concern in using EMG is the noise generated, which corrupts the electric signal generated by the muscles. The project explains an efficient FFT core for filtering the electric signals for Electromyography. This project introduces the concept of using CORDIC in generating the FFT for filtering the electric signals. Though the system presented here is efficient but has its own limitations. The current approach can be improved by using radix 4 FFT as radix-4 completes one cycle for a given waveform hence performance is improved. Also the biggest concern with this approach is it is efficient for static signals but when try to measure dynamic signals like running then the efficiency of radix -2 is lesser compared to radix-4. As an enhancement the authors suggest to use radix-4 FFT for creating the FFT core. The performance can be improved tremendously going for wavelet form for building the FFT signal. Another enhancement that can be added is passing a 3-D image instead of 2-D signal for calculating the EMG.

[1]. Ali, A. A., Albarahany, & Quan, L. (2012, October). EMG signals detection technique in voluntary muscle movement. In 2012 6^th International Conference on New Trends in Information Science, Service Science and Data Mining (ISSDM2012) (pp. 738-742). IEEE.

[2]. Ali, A., Kanagasabapathi, C., & Yellampalli, S. S. (2017, December). Pipelined-scalable FFT core with optimized custom floating point engine for OFDM system. In 2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT) (pp. 1-6). IEEE. https://doi.org/10.1109/ICEECCOT.2017.8284590

[3]. Bansal, P., Dhaliwal, B. S., & Gill, S. S. (2014, March). Memory-efficient Radix-2 FFT processor using CORDIC algorithm. In 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) (pp. 1-5). IEEE. https://doi.org/10.1109/ICG CCEE.2014.6922202

[4]. Belabed, T., Jemmali, S., & Souani, C. (2018, March). FFT implementation and optimization on FPGA. In 2018 4^th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) (pp. 1-6). IEEE https://doi.org/10.1109/ATSIP.2018.8364454

[6]. Han, W., Erdogan, A. T., Arslan, T., & Hasan, M. (2005, January). The development of high performance FFT IP Cores through hybrid low power algorithmic methodology. In Proceedings of the 2005 Asia and South Pacific Design Automation Conference (pp. 549-552). ACM. https://doi.org/10.1145/1120725.1120959

[7]. He, S., & Torkelson, M. (1996, April). A new approach to pipeline FFT processor. In Proceedings of International Conference on Parallel Processing (pp. 766-770). IEEE. https://doi.org/10.1109/IPPS.1996.508145

[8]. Joseph, E., Rajagopal, A., & Karibasappa, K. (2012, December). FPGA implementation of Radix-2 FFT processor based on Radix-4 CORDIC. In 2012 Nirma University International Conference on Engineering (NUiCONE) (pp. 1-6). IEEE. https://doi.org/10.1109/ NUICONE.2012.6493231

[11]. Nazmi, N., Rahman, M. A. A., Mazlan, S. A., Zamzuri, H., & Mizukawa, M. (2015, March). Electromyography (EMG) based signal analysis for physiological device application in lower limb rehabilitation. In 2015 2^nd International Conference on Biomedical Engineering (ICoBE) (pp. 1-6). IEEE. https://doi.org/10.1109/ICoBE. 2015.7235878

[12]. Rani, S., Sarma, T. C., & Prasad, K. S. (2012). Text file encryption using FFT technique in Lab VIEW 8.6. IJRET: International Journal of Research in Engineering and Technology, 1(01), 2319-1163.

[13]. Saenz, S. J., Cisneros, S. O., & Dominguez, J. R. (2015, November). FPGA design and implementation of radix-2 fast Fourier transform algorithm with 16 and 32 points. In 2015 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) (pp. 1-6). IEEE. https://doi.org/10.1109/ROPEC.2015.7395113

[14]. Tang, A., Yu, L., Han, F., & Zhang, Z. (2016, March). CORDIC-based FFT real-time processing design and FPGA implementation. In 2016 IEEE 12^th International Colloquium on Signal Processing & Its Applications (CSPA) (pp. 233-236). IEEE. https://doi.org/10.1109/CSPA.2016. 7515837

[15]. Wold, E. H., & Despain, A. M. (1984). Pipeline and parallel-pipeline FFT processors for VLSI implementations. IEEE Transactions on Computers, 33(5), 414-426. https://doi.org/10.1109/TC.1984.1676458

FPGA Implementation of Radix-2 FFT Processor Based On Cordic Algorithm for Electromyography

Abstract

Keywords :

Introduction

1. Literature Survey

2. Radix-2 FFT Algorithm

3. Pseudocode for the Simulation

4. Results and Discussions

4.1 FFT 8 Simulation Results

4.3 MATLAB Simulation Output for FFT

Conclusion

References