

# Reconfigurable Power Efficient High Throughput Digital System

R.N. Patil

Asso. Professor in Electronics Engineering,  
DKTE'S Textile & Engineering Institute, Ichalkaranji.  
INDIA  
ravinpatil@gmail.com

S. Subbaraman

Professor in Electronics Engineering,  
Walchand College of Engineering, Sangli. INDIA.  
shailasubbaraman@yahoo.co.in2

**Abstract--** In various real-time applications, such as Computer Graphics, Virtual Reality, System Control, Digital Signal Processing etc., a sequence of data sets needs to be processed by multiple functional units either sequentially using pipeline architecture or in parallel using parallel architectures or both sequentially and in parallel in case of mixed type of architectures. Power reduction in electronic systems has always been one of the important considerations. Similarly the demand for real time processing of many complex algorithms as required by state-of-the-art DSP applications has put stress on capability of electronic systems to handle high input data rates resulting into high throughput systems. This paper deals with designing a dynamic frequency scaling enabled platform for power efficient and high throughput digital system in multiprocessing environment, and its implementation in FPGA. The dynamic frequency scaling unit reconfigures the main clock of FPGA according to the processing ability or the volume of the data to be processed by individual subsystems of a multiprocessing system. The effect of this on energy saving and throughput of the multiprocessing system are evaluated in the context of image processing by implementing few image processing algorithms such as contrast stretching, Sobel Filter, Image Thresholding, Gaussian Filter. The details of the concept, implementation and the results thereof are presented here.

**Keywords:** Reconfigurable, FPGA, Throughput, Power Efficient, Digital System

## I. INTRODUCTION

The state-of-the-art electronic systems process many demanding real time applications and hence deal with complex computations with involved hardware to support the required complexity [1]. The portability aspect of recent electronic gadgets further necessitates the use of batteries which put restriction on the maximum power dissipation of the system. Since power dissipation is a linear function of frequency of operation, applying appropriate clock to functional blocks of the system is considered to be one of the solutions to optimize the power consumption. In such case, it is expected that the system will work in multiple clock domains which may need to handle synchronization issues [2-5]. This paper deals with deriving multiple clocks as per the functional requirement of the partitions of the system to optimize both power dissipation

and throughput of the system. The concept implemented in the research work is detailed below.

## II. CONCEPT OF THE SYSTEM

### A. Selecting a Template (Heading 2)

First, confirm that you have the correct template for your paper size. This template has been tailored for output on the A4 paper size. If you are using US letter-sized paper, please close this file and download the file "MSW\_USltr\_format".

### B. Maintaining the Integrity of the Specifications

The template is used to format your paper and style the text. All margins, column widths, line spaces, and text fonts are prescribed; please do not alter them. You may note peculiarities. For example, the head margin in this template measures proportionately more than is customary. This measurement and others are deliberate, using specifications that anticipate your paper as one part of the entire proceedings, and not as an independent document. Please do not revise any of the current designations.

The advantage of pipelined or parallel architecture is that both can handle data at rates which are n-times faster than those fed to conventional architecture, where n is the level of pipelining or parallelism [6-7]. If reduction in power consumption is the priority of the design over high data rates then, both architectures help in reducing the power considerably due to reduction in power supply voltage.

Consider the architecture of Figure 1. The tasks to be performed by different sub-blocks in parallel may have different time complexity. Secondly, the time taken to finish the different tasks would be different, depending on the volume of the data to be processed by that task. In such cases the clock frequency of a digital system is generally determined by the time taken by the slowest task. If all task blocks are driven by the same clock, then the different tasks will get completed at different times depending upon their complexity and the volume of the data to be processed. In this situation the faster blocks will have to wait for feeding the processed data to the next stage, even after completion of their task, till the slowest. This causes unnecessary clock transition at the inputs of the

already processed faster block, resulting into unnecessary power dissipation. Secondly, since the internal latch period has to be higher than the time taken by the slowest task, overall throughput also gets hampered.



Figure 1: Parallel Architecture

To overcome this problem, an approach for designing multiprocessor digital system, by dynamically scaling the clock signals of different processing blocks of the system was proposed. As a sample of proof of this approach, a digital system is developed implementing image processing algorithms of different complexities and processing times, on Xilinx Virtex5 FPGA [8] using ISE environment. The details of this approach and the results thereof are presented.

Block diagram of the proposed system is as shown in Figure 2, where two sub-blocks T1 and T2 implementing two different algorithms of different complexity. T1 and T2 are fed with different clocks CLK1 and CLK2 respectively, which are derived dynamically from the system clock, using DCM Core, while the output stage is fed with system clock.



Figure 2: Proposed System

The input data of the same image (size: 64 x 64) is fed to sub-blocks T1 and T2 designed to implement two different image processing algorithms viz. Boundary Extraction and Contrast Stretching respectively. Since the time complexity of the algorithm implemented in T1 is more than that implemented in T2, frequency of CLK1 is higher than that of CLK2 roughly by the ratio of time complexities of the two algorithms. The outputs of these two sub-blocks are ANDED together to feed the next single functional block (here output stage) for further processing viz. Image fusion. By this approach, the unnecessary clock transitions for T2, during the period after it

has completed its task till output stage is triggered is avoided resulting into power saving.

The scaling factor is specific of algorithms implemented in the processing blocks. In general and real time applications, it is necessary to compute this scaling factor and feed it internally to the appropriate hardware to generate the scaled clock frequencies dynamically. The experimentation was carried out with dynamically generated clock signals. The clock signals were generated dynamically with the help of scaling factor which in turn also was generated dynamically.

Since the Digital Clock Manager(DCM) provided by Xilinx, has a facility to accept the 16 bit input with higher byte as numerator (N) and lower byte as denominator (D) in a concatenated form, to generate the (N/D) times reference clock frequency, DCM facility was conveniently used to dynamically generate the required scaled clock frequencies.

### III. RESULTS AND CONCLUSIONS

To prove the concept of reducing power dissipation by dynamically scaling the clock frequency, the experimentation was carried in three groups with four sets in each group. Group A corresponds to three sets of experiments with implementation of Sobel Filter and Image Thresholding algorithms in T1 and T2 while Group B corresponds to three sets of experiments with implementation of Contrast Stretching algorithm in T1 while Boundary Extraction algorithm in T2, Group C corresponds to three sets of experiments with implementation of Contrast Stretching algorithm in T1 while image thresholding algorithm in T2 and Group D corresponds to three sets with implementation of Sobel Filter and Gaussian Filter. The results, after downloading bit-stream into FPGA using Xilinx ISE 14.4, on Virtex5, are given in the following table. The plus (+) and minus (-) sign within the brackets in few table cells represent the increase and decrease in the corresponding column parameter.

From the results, it is observed that the dynamic scaling of clock frequency by different factors as per the complexity of the hardware modules working in parallel helps to improve Power-Delay-Product and throughput substantially in the range of 14% to 91% as compared to the conventional system where all the implemented algorithms runs with same clock frequency. The improvement depends upon the ratio of time complexities of the two algorithms working in parallel. It is proved that this approach leads to increase in the throughput of a multiprocessor digital system with reduction in the execution time and marginal increase in power dissipation in comparison with the similar type of digital system operating on a single clock. The approach helps to improve PDP which is considered as figure of merit parameter in high speed digital systems.

TABLE I. EXPERIMENTAL RESULTS

REFERENCES

| Group                                                  | Clock           | Algorithm Implemented/<br>Image Size used |                                 | Delay      | Power         | Power<br>Delay<br>Product | T [2] |
|--------------------------------------------------------|-----------------|-------------------------------------------|---------------------------------|------------|---------------|---------------------------|-------|
|                                                        |                 | T1                                        | T2                              | μs         | mw            | μJ                        |       |
| A                                                      | System Clock    | Sobel Filter                              | Image<br>Thresholdin<br>g       | 216        | 1777          | 384                       | 37.9  |
|                                                        | Scaled Clocks   |                                           |                                 | 189        | 1970          | 372.33                    | [3]   |
|                                                        | Scaled : System | Image size 64 x 64 pixels                 |                                 | 12.5% (-)  | 10.86%<br>(+) | 2.99% (-)                 | 14    |
| B                                                      | System Clock    | Contrast<br>Stretching                    | Boundary<br>Extraction          | 222        | 1766          | 392                       | [4]   |
|                                                        | Scaled Clocks   |                                           |                                 | 158        | 1950          | 308.1                     |       |
|                                                        | Scaled : System | Image size 64 x 64 pixels                 |                                 | 28.82% (-) | 10.41%<br>(+) | 21.41%<br>(-)             | 40    |
| C                                                      | System Clock    | Contrast<br>Stretching                    | Image<br>Thresholdin<br>g       | 83         | 1538          | 128                       |       |
|                                                        | Scaled Clocks   |                                           |                                 | 48         | 1620          | 77.76                     | 170   |
|                                                        | Scaled : System | Image size 64 x 64 pixels                 |                                 | 42.16% (-) | 5.33%<br>(+)  | 39% (-)                   | 66    |
| D                                                      | System Clock    | Gaussian<br>Filter                        | Sobel Filter                    | 845        | 1573          | 1329                      | [5]   |
|                                                        | Scaled Clocks   | Image Size<br>128 x 128<br>pixels         | Image Size<br>64 x 64<br>pixels | 441        | 2038          | 898.75                    |       |
|                                                        | Scaled : System |                                           |                                 | 47.81% (-) | 29.56%<br>(+) | 32.38%<br>(-)             | 91    |
| [8] <a href="http://www.xilinx.com">www.xilinx.com</a> |                 |                                           |                                 |            |               |                           |       |