Reconfigurable Computing

Border
PDFPDF

Reconfigurable Computing encompasses many different architectures which allow configurability on the hardware level. The main motivation is to bring together the high performance of hardwired Application Specific Integrated Circuits  (ASICs)  and the programming flexibility of micro-processors. The architectures offer different compromises between these antipodes, but the focus is often on highly parallel processing optimized for high data throughput rather than low latency response. A popular arrangement of parallel processing elements (PEs) is a tile architecture with a configurable interconnect between the tiles. The functionality of the PEs can range from boolean functions on individual bits to entire processors. In this project we have worked with an FPGA (Field Programmable Gate Arrays) which has fine-grained PEs (4 bit input look-up tables) and a computing array with coarse-grained PEs (24 bit ALUs)

On the relatively small FPGA (XC4085 XLA from Xilinx) we have implemented a solver for the level set equation and use it for segmentation of medical images (Fig. 2, [1]). The FPGA was operated on a low-cost PCI card in a standard PC. The computing array (XPP from PACT) we have used for denoising of images with a non-linear diffusion model (Fig. 4, [3]).  Because the actual hardware was not available at first, the configurations were tested with a clock accurate simulator and the results generated with a software simulation.

In view of the memory wall problem, a huge advantage of the free configurability is the possibility to incorporate all sorts of data-flow optimizations and parallelism into the implementation (Fig. 1). In particular deep pipelines can be built which require little bandwidth and execute many operations in parallel (Fig. 3, [2]). So even with the low frequencies of the devices in tens of MHz, a GHz PC can be easily outperformed, since we do not suffer a bandwidth problem and hundred or more operations are performed in each clock cycle in parallel.

The main disadvantage is the more tedious programming models based on hardware description languages and lack of native floating point arithmetic. For image processing, characterized by low precision input, we can gain more parallelism by reducing computational precision according to the input, and therefore reconfigurable computing is popular in this area. In scientific computing configuring double precision floating point arithmetic would consume too many resources. For iterative solvers mixed precision methods solve this problem [4]. However, the programming complexity remains. Therefore, we are interested in high level languages which can efficiently utilize the massive parallelism of reconfigurable devices.

Figures

1. Some optimizations available in reconfigurable computing.


2. Segmentation of a brain tumor computed with the FPGA.

 

3. Configuration of a 3x3 filter for a 2D image in the XPP.


4. Non-linear diffusion as implemented on the XPP.

Bibliography

 


[1]
Steffen Klupsch, Markus Ernst, Sorin A. Huss, Martin Rumpf, and Robert Strzodka. Real time image processing based on reconfigurable hardware acceleration. In Proceedings of IEEE Workshop Heterogeneous reconfigurable Systems on Chip, 2002. 
[2]
Robert Strzodka. Image processing on the XPP.  Aug 2002. Evaluation study. 
[3]
Robert Strzodka. Hardware Efficient PDE Solvers in Quantized Image Processing. PhD thesis, University of Duisburg-Essen, December 2004. (PDF)
[4]
Robert Strzodka and Dominik Göddeke. Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), pages 259–268, April 2006.