



The CS2412 is an online programmable, pipelined architecture 1024-point FFT/IFFT core. It is capable of processing continuous data streams with high data throughput rate of up to 50 Msamples/Sec. This highly integrated application specific silicon core is the pipelined version of CS2411 and is available in both ASIC and FPGA versions that have been handcrafted by Amphion for maximum performance while minimizing power consumption and silicon area.



Figure 1: CS2412 Architecture

# FEATURES

- On-line programmable FFT/IFFT core
- Pipelined architecture
- 16-bit complex input/output in two's complement format (32-bit complex word)
- 16-bit twiddle factors generated inside the core
- 18-bit internal accuracy
- Programmable shift down control
- Radix-4 architecture
- Simultaneous loading/downloading supported
- Both input and output in normal order
- No external memory required
- Optimized for both ASIC and FPGA technologies with the same functionality
- Fully synchronous design

# **APPLICATIONS**

- Communications modulation schemes
- Image processing
- Atmospheric imaging
  - Spectral representation

## FAST FOURIER TRANSFORM

FFT (Fast Fourier Transform) and IFFT (Inverse Fast Fourier Transform) are algorithms computing 2<sup>P</sup>-point discrete Fourier transform and inverse discrete Fourier transform, as defined below

FFT

$$Y(k) = \sum_{n=0}^{N-1} X(n) W_{N}^{-nk}, k = 0, 1, 2... N-1$$
[1]

IFFT: 
$$Y(k) = \frac{1}{N} \sum_{n=0}^{N-1} X(n) W_N^{nk}$$
, k = 0, 1, 2... N-1 [2]

Where N=2<sup>p</sup> and W<sub>N</sub> =  $e^{-j2\pi/N}$ 

The computational complexity of FFT and IFFT is proportional to  $Nlog_RN$ , where R is the radix base on which FFT/IFFT is performed. The higher the radix, the less number of multiplication is required, however the more simultaneous multiple data access is required which causes the circuits to be more complicated. The radix-4 algorithm offers a balance between the computational and circuit complexity and is often used in construction of higher radix FFT computation units when designing high performance FFT/IFFT hardware.

# CS2412 SYMBOL AND PIN DESCRIPTION

Table 1 describes input and output ports (shown graphically in Figure 2) of the CS2412 1024-point FFT/IFFT core. Unless otherwise stated, all signals are active high and bit(0) is the least significant bit.





| Name   | I/O | Width | Description                                                                                                                                                                                                                                                             |
|--------|-----|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CLK    | I   | 1     | Data clock signal, rising edge active                                                                                                                                                                                                                                   |
| NotRST | I   | 1     | Asynchronous global reset signal, active LOW                                                                                                                                                                                                                            |
| ТТуре  | I   | 1     | Static signal specifying the transform type,<br>0: FFT,<br>1: IFFT                                                                                                                                                                                                      |
| SDC    | I   | 3     | Input signal specifying the number of bits for the additional scaling down operation, loaded when XBS is active and associated with the 1024-point block indicated by XBS.                                                                                              |
| Xre    | I   | 16    | Real component of input data X, in two's complement format                                                                                                                                                                                                              |
| Xim    | I   | 16    | Imaginary component of input data X, in two's complement format                                                                                                                                                                                                         |
| XBS    | I   | 1     | Input data X block start signal, active HIGH, associated with the first input data of the N-point block. The remaining N-1 data of the N-point data block are loaded into the core in the following N-1 data clock cycles in the natural order.                         |
| XBIP   | 0   | 1     | Output signal indicating loading X is in Progress. XBIP goes to HIGH the next clock cycle when XBS is active and returns to LOW when the last data of the N-point block is loaded into the core. XBS is ignored when it is HIGH.                                        |
| YBS    | 0   | 1     | Output data Y block start signal, active HIGH, asserted when the first data of the N-point transformed block is on the output port. The remaining N-1 data of the N-point transform result come out of the core in the following N-1 clock cycles in the natural order. |

## Table 1: CS2412 1024-Point FFT/IFFT Interface Signal Definitions



#### Table 1: CS2412 1024-Point FFT/IFFT Interface Signal Definitions

| Name | I/O | Width | Description                                                                                                              |
|------|-----|-------|--------------------------------------------------------------------------------------------------------------------------|
| YAV  | 0   | 1     | Output data Y available indicator, active HIGH, asserted with all data of the N-point transform result                   |
| YRe  | 0   | 16    | Real component of output data Y, in two's complement format, valid only when YAV is HIGH                                 |
| YIm  | 0   | 16    | Imaginary component of output data Y, in two's complement format, valid only when YAV is $\ensuremath{HIGH}$             |
| YOV  | 0   | 1     | Output data Y overflow signal, active HIGH, asserted when overflow occurs during the transform of the output data block. |
| YSDC | 0   | 3     | Output signal indicating the SDC of the output data block                                                                |

FUNCTIONAL DESCRIPTION

The CS2412 performs decimation in frequency (DIF) radix-4 forward or inverse Fast Fourier Transforms on complex data. Data is loaded into its workspace in normal sequential (natural) order. The transformed data is returned in normal sequential order. It performs 1024-point FFT/IFFT using the following equations:

FFT: 
$$Y(k) = \frac{1}{2^{4+SDC}} \sum_{n=0}^{N-1} X(n) W_{N}^{-nk}$$
, k = 0, 1, 2 ... N-1 [3]

IFFT: 
$$Y(k) = \frac{1}{2^{4+SDC}} \sum_{n=0}^{N-1} X(n) W_N^{nk}$$
, k = 0, 1, 2 ... N-1 [4]

Where N is equal to 1024, SDC is the scaling down control signal, X(n) is the complex input data and Y(k) the complex output data. Both the real and imaginary components of input X(n) and output Y(k) are 16-bit numbers in two's complement format.

The CS2412 achieves high data throughput rates of up to 50 Msamples/Sec by employing a pipelined architecture with fixed-point arithmetic operations and pre-scaling strategy to handle possible overflow in computation. The core has 4-bit unconditional scaling down operations and 7-bit controlled scaling down operations specified by input signal SDC, giving the user the necessary gain control required in a specific application. The CS2412 core uses radix-4 decimation in frequency (DIF) algorithm to perform the transform. It consists of five radix-4 pipelined stages with reshuffle buffers between stages and is capable of processing continuous data stream. Both the input and output are in the normal order (the ordinary time order).

The Selection of transform (FFT/IFFT) is controlled by a static signal. However, the scaling down control is applied on a block-by-block basis. The core detects possible overflow during computation and saturates overflow data accordingly.

In order to minimize the device size, CS2412 uses a  $2 \times \text{clock}$  internally. For example, the input data is clocked in using the data clock while the core operates on the  $2 \times \text{clock}$ . The output data is also clocked out on the 2xclock although it changes only on every 2 cycles of the  $2 \times \text{clock}$ . When implemented on FPGA devices, The  $2 \times \text{clock}$  is generated by the on-chip PLL of Apex 20KE device or DLL of Virtex devices.

## WORD LENGTH

The internal wordlength of each radix-4 operation of CS2412 is specified by Figure 3. The intermediate data stored in the reshuffle buffers are 16-bit wide (32 bits for complex numbers). The wordlength grows to 18 bits after the radix-4 butterfly. The twiddle multiplier takes the 18-bit butterfly output and 16-bit twiddle factors, generating 34-bit product. The product is then scaled and rounded to 16 bits for the next stage radix-4 operation.



**Figure 3: Wordlength Specification** 

## FUNCTIONAL OPERATION

The core is capable of processing continuous data stream. Loading the input data is performed under the control of signal XBS. Signal XBS is asserted when the output signal XBIP is de-asserted. It indicates the first data of the 1024-point data block and the data is clocked in on the clock rising edge. The rest of the 1023-points of data are loaded in the successive 1023 clock cycles in the natural order. When the last data is loaded signal XBIP returns to LOW. Loading of the next data block can be started by asserting XBS at any time from the next clock cycle after XBIP returns to LOW.

Signal YBS is asserted, when the first of the result data appears on the output port. The rest of the result data will be continuously clocked out in the following 1023 clock cycles. Signal YAV will be asserted during the period of the result being output. Figure 4 illustrates the functional timing of the I / O signals.



Figure 4: Input/Output Functional Timing



## SHIFTING CONTROL

The kernel operation for 1024-point transform consists of radix-4 butterfly followed by a twiddle multiplication. Theoretically in the worst case the result value may grow by a factor of up to 5.657 in the first stage. This occurs when the four input data to the radix-4 computation have the maximal absolute value and the twiddle angle is  $\pi/_4$ . The final result reaching stage 5 may grow by a factor of up to 1303.793. This represents a possible wordlength growth of 11 bits. As the output is 16-bit value and fixed-point arithmetic is employed in the core, it is necessary to be able to scale the result to avoid overflow while still obtaining a good dynamic range.

Since the input word length is 16 bits and the output 16 bits, zero bit growth can be allowed. Thus, the megafunction must have the capability of up to 11-bit right shifting of the internal result to enable overflow to be avoided. The total of 11 bit scaling down operation is assigned to each stage according to Table 2. When SDC is set to the maximal value, there will be no overflow for any input data.

#### Table 2: Number of Shifting Bits in Each Stage

| SDC | Stage<br>1 | Stage<br>2 | Stage<br>3 | Stage<br>4 | Stage<br>5 | Total |
|-----|------------|------------|------------|------------|------------|-------|
| 000 | 1          | 1          | 1          | 1          | 0          | 4     |
| 001 | 2          | 1          | 1          | 1          | 0          | 5     |
| 010 | 2          | 2          | 1          | 1          | 0          | 6     |
| 011 | 3          | 2          | 1          | 1          | 0          | 7     |
| 100 | 3          | 2          | 2          | 1          | 0          | 8     |
| 101 | 3          | 2          | 2          | 1          | 1          | 9     |
| 110 | 3          | 2          | 2          | 2          | 1          | 10    |
| 111 | 3          | 2          | 2          | 2          | 2          | 11    |

The first 4-bits of shift control are mandatory. The remaining 7-bits are applied at the discretion of the user under the control of SDC.

## COMPUTATION ACCURACY

A rounding technique is employed to achieve the maximal computation accuracy possible for the given word lengths. The core performs the round-to-the-nearest operation to keep the loss in accuracy minimal. When the intermediate value, for instance from the twiddle multiplication result, is required to scale down, the most significant bit of the portion to be rounded off is added to the word which remains. This is a compromise between true rounding and truncation. Compared with the technique that unconditionally sets the bottom bit to '1', the partial rounding scheme achieves better accuracy and guarantees to generate an all-zero output block for an all-zero input block.

CS2412 detects overflow at each computation stage and uses the following procedure to saturate output overflow samples:

If (X >= 32768) X = 32767;

If (X <= -32768) X = -32767;

The bit accurate C model provided checks of the output error with respect to SDC signal. Table 3 represents the output error with respect to SDC signal.

| SDC | Number of bit<br>shifts | Maximum<br>Error | Overflow | Mean Errors | Mean Square Error |
|-----|-------------------------|------------------|----------|-------------|-------------------|
| 7   | 11                      | 1                | No       | 0.18604     | 0.18604           |
| 6   | 10                      | 2                | No       | 0.37012     | 0.37402           |
| 5   | 9                       | 3                | No       | 0.51807     | 0.58057           |
| 4   | 8                       | 4                | No       | 0.7334      | 1.0459            |
| 3   | 7                       | 6                | No       | 0.92969     | 1.5928            |
| 2   | 6                       | 8                | Yes      | 0.92773     | 1.6035            |
| 1   | 5                       | 4809             | Yes      | 294.9297    | 425177.6035       |
| 0   | 4                       | 28231            | Yes      | 5250.0874   | 59558970.7437     |

## Table 3: Output Error With Respect to SDC Signal

The figures in this Table are obtained from blocks of 16-bit random number input test data vectors. The mean errors are calculated as the difference between double precision outputs and 16-bit integer results, while the mean square errors are the square of this difference, averaged over the total number of blocks.

## TIMING CHARACTERISTICS

The following timing characteristics are based on EP20K300EFC672-2X device and commercial temperature range operating conditions.

# LATENCY

The computation latency of CS2412 is 2040 clock cycles. This includes all the delay for data reshuffle, computation pipeline cuts and re-order of the transform result. The latency will be consistent with all the operating modes, regardless of the settings of all the control signals.

#### Transform Time

The CS2412 core achieves the following transform time when clocked at 50 MHz, for example, with continuous input data:

1024 point transform time =  $1024 \times 1/50 \times 10^6 = 20.48 \mu s$ 

## **Table 4: Timing Characteristics**

| Characteristic   | Min    | Max    | Units |
|------------------|--------|--------|-------|
| Clock frequency  |        | 50     | MHz   |
| Input setup time | 11.397 |        | ns    |
| Output delay     |        | 11.397 | ns    |

6



# AVAILABILITY AND IMPLEMENTATION INFORMATION

Amphion offers the CS2412 core in ASIC and programmable logic versions. Consult your local Amphion representative for product specific performance information, current availability of individual products, and lead times on ASIC or different programmable logic core porting.

The implementation information provided in Table 5 has been obtained for the algorithm implemented as a stand-alone design on a EP20K300EFC672-2X device. It should be noted that if the algorithm is implemented on different Altera devices, the performance metrics and density might vary accordingly.

## Table 5: Programmable Logic Cores

| PRODUCT<br>ID | SILICON<br>VENDOR | MAXIMUM<br>FREQUENCY<br>(MHz) | DEVICE<br>RESOURCES<br>USED (LOGIC) | DEVICE RESOURCES<br>USED (MEMORY) | AVAILABILITY |
|---------------|-------------------|-------------------------------|-------------------------------------|-----------------------------------|--------------|
| CS2412AA*     | Altera            | 50                            | 8704 LEs                            | 72 ESBs                           | Now          |

\* The implementation information on ASICs or Xilinx devices are available upon request.

# CS2412 1024-Point Pipelined FFT/IFFT



#### **ABOUT AMPHION**

Amphion (formerly Integrated Silicon Systems) is the leading supplier of speech coding, video/ image processing and channel coding application specific silicon cores for system-on-a-chip (SoC) solutions in the broadband, wireless, and mulitmedia markets.

Web: www.amphion.com

Email: info@amphion.com

### **CORPORATE HEADQUARTERS**

Amphion Semiconductor Ltd 50 Malone Road Belfast BT9 5BS Northern Ireland, UK Tel: +44 28 9050 4000

Fax: +44 28 9050 4001

## **EUROPEAN SALES**

Amphion Semiconductor Ltd CBXII, West Wing 382-390 Midsummer Boulevard **Central Milton Keynes** MK9 2RG England, UK Tel: +44 1908 847109

Fax: +44 1908 847580

### **WORLDWIDE SALES & MARKETING**

Amphion Semiconductor, Inc 2001 Gateway Place, Suite 130W San Jose, CA 95110

Tel: (408) 441 1248 Fax: (408) 441 1239

#### **CANADA & EAST COAST US SALES**

Amphion Semiconductor, Inc Montreal Quebec Canada

Tel: (450) 455 5544 Fax: (450) 455 5543

## SALES AGENTS

| Voyageur Technical Sales Inc<br>1 Rue Holiday<br>Tour Est, Suite 501<br>Point Claire, Quebec<br>Canada H9R 5N3 | Phoenix Technologies Ltd<br>3 Gavish Street<br>Kfar-Saba, 44424<br>Israel                                     | SPINNAKER SYSTEMS INC<br>Hatchobori SF Bldg. 5F 3-12-8<br>Hatchobori, Chuo-ku<br>Tokyo 104-0033 Japan |  |
|----------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|--|
| Tel: (905) 672 0361<br>Fax: (905) 677 4986                                                                     | Tel: +972 9 7644 800<br>Fax: +972 9 7644 801                                                                  | Tel: +81 3 3551 2275<br>Fax: +81 3 3351 2614                                                          |  |
| JASONTECH, INC<br>Hansang Building, Suite 300<br>Bangyidong 181-3, Songpaku<br>Seoul Korea 138-050             | SPS-DA PTE LTD<br>21 Science Park Rd<br>#03-19 The Aquarius<br>Singapore Science P ark II<br>Singapore 117628 |                                                                                                       |  |

Tel: +82 2 420 6700 Tel: +65 774 9070 Fax: +82 2 420 8600 Fax: +65 774 9071

© 2002 Amphion Semiconductor Ltd. All rights reserved. Amphion, the Amphion logo, "Virtual Components for the Converging World", are trademarks of Amphion Semiconductor Ltd. All others are the property of their respective owners.