Reconfigurable Computing encompasses many different architectures which allow configurability on the hardware level. The main motivation is to bring together the high performance of hardwired Application Specific Integrated Circuits (ASICs) and the programming flexibility of micro-processors. The architectures offer different compromises between these antipodes, but the focus is often on highly parallel processing optimized for high data throughput rather than low latency response. A popular arrangement of parallel processing elements (PEs) is a tile architecture with a configurable interconnect between the tiles. The functionality of the PEs can range from boolean functions on individual bits to entire processors. In this project we have worked with an FPGA (Field Programmable Gate Arrays) which has fine-grained PEs (4 bit input look-up tables) and a computing array with coarse-grained PEs (24 bit ALUs)
On the relatively small FPGA (XC4085 XLA from Xilinx) we have implemented a solver for the level set equation and use it for segmentation of medical images (Fig. 2, [1]). The FPGA was operated on a low-cost PCI card in a standard PC. The computing array (XPP from PACT) we have used for denoising of images with a non-linear diffusion model (Fig. 4, [3]). Because the actual hardware was not available at first, the configurations were tested with a clock accurate simulator and the results generated with a software simulation.
In view of the memory wall problem, a huge advantage of the free configurability is the possibility to incorporate all sorts of data-flow optimizations and parallelism into the implementation (Fig. 1). In particular deep pipelines can be built which require little bandwidth and execute many operations in parallel (Fig. 3, [2]). So even with the low frequencies of the devices in tens of MHz, a GHz PC can be easily outperformed, since we do not suffer a bandwidth problem and hundred or more operations are performed in each clock cycle in parallel.
The main disadvantage is the more tedious programming models based on hardware description languages and lack of native floating point arithmetic. For image processing, characterized by low precision input, we can gain more parallelism by reducing computational precision according to the input, and therefore reconfigurable computing is popular in this area. In scientific computing configuring double precision floating point arithmetic would consume too many resources. For iterative solvers mixed precision methods solve this problem [4]. However, the programming complexity remains. Therefore, we are interested in high level languages which can efficiently utilize the massive parallelism of reconfigurable devices.
Figures
1. Some optimizations available in reconfigurable computing.
2. Segmentation of a brain tumor computed with the FPGA.
3. Configuration of a 3x3 filter for a 2D image in the XPP.
4. Non-linear diffusion as implemented on the XPP.
Bibliography
|