一种基于迭代短卷积算法的低复杂度并行FIR滤波器结构

2014-05-30 11:42:18田晶晶李广军

电子与信息学报 2014年5期

田晶晶李广军李强②

田晶晶*①李广军①李强①②

① (电子科技大学通信与信息工程学院成都 611731)②(奥尔胡斯大学工程系奥尔胡斯 DK-8000)

该文基于快速卷积算法，提出一种适用于线性相位FIR滤波器的并行结构。该结构采用快速卷积算法减少子滤波器个数，同时让尽可能多的子滤波器具有对称系数，然后利用系数对称的特性减少子滤波器模块中的乘法器数量。对于具有对称系数的FIR滤波器，提出的并行结构能够比已有的并行FIR结构节省大量的硬件资源，尤其当滤波器的抽头数较大时效果更明显。具体地，对一个4并行144抽头的FIR滤波器，提出的结构比改进的快速FIR算法(Fast FIR Algorithm, FFA)结构节省36个乘法器(14.3%)，23个加法器(6.6%)和35个延时单元(11.0%)。

并行FIR滤波器；快速卷积；迭代短卷积；对称系数

1 引言

有限脉冲响应(Finite Impulse Response, FIR)滤波器因其优良的线性相位特性以及无条件稳定的特点，在视频和图像处理，无线通信等多个领域都得到了广泛的应用。在某些应用中，如高速的遥感卫星接收机，4G通信系统等，由于其数据传输速率越来越高，导致其对FIR滤波器的吞吐率要求也越来越高，而在另一些运用领域，如移动电话，手持终端医疗设备等则对FIR滤波器的功耗有着严格的要求。

本文提出一种改进的并行FIR滤波器结构，该结构利用快速卷积算法减少并行结构的子滤波器个数，同时让尽可能多的子滤波器具有对称系数，然后利用系数对称的特性来降低子滤波器模块中乘法器的数量。相比已有的并行FIR结构，提出的结构可以进一步节省硬件资源，尤其在滤波器抽头数较大的时候。

本文内容安排如下：第2节介绍通过线性卷积得到并行FIR滤波器结构的设计思路，第3节介绍提出的并行FIR滤波器结构，第4节是硬件资源对比分析，第5节是有限字长性能的分析，第6节为结论。

2 基于线性卷积的并行FIR滤波器结构

3 本文提出的低复杂度并行FIR滤波器结构

图2 具有对称系数的子滤波器的实现

3.1 2并行结构

对于2并行线性相位FIR滤波器，其抽头数是并行度的整数倍，具有对称系数的子滤波器集合如式(4)。

3.2 3并行结构

对于3并行的线性相位FIR滤波器，具有对称系数的子滤波器集合如式(6)。

表1 2并行滤波器结构中对应于的子滤波器

图3 本文提出的2并行FIR滤波器的实现

表2 3并行滤波器结构中对应于的子滤波器

图4 本文提出的3并行FIR滤波器的实现

如图5所示，黑色背景框图表示系数对称的子滤波器模块。本文提出的3并行FIR滤波器结构有5个子滤波器，其中2个具有对称系数。而文献[15]中的3并行结构有6个子滤波器，其中4个具有对称系数。

3.3 4并行结构

对于4并行的线性相位的FIR滤波器，具有对称系数的子滤波器集合如式(8)。

图5 3并行的改进FFA结构与提出的3并行结构的子滤波器模块比较

表3 4并行滤波器结构中对应于的子滤波器

图6 4并行的改进FFA结构与本文提出的4并行结构的子滤波器模块比较

3.4 迭代结构

4 复杂度对比分析

需要的总的延时单元数量由表达式(13)得到

表4对本文提出的结构和文献[15]中改进的FFA结构在不同并行度和抽头数下所用的硬件资源做了一个比较，对比资源包括：乘法器数量()，节省的乘法器数量(RM)，总的加法器数量()，子滤波器模块中的加法器数量(Sub)，前置和后置矩阵中所用加法器数量()，节省的加法器数量(RA)，节省的延时单元数量(RD)。表5展示了144抽头8并行和4并行的FIR滤波器在不同实现结构下所消耗的乘法器数量()，加法器数量()，以及延时单元的数量()。如表4所示，本文提出的4并行结构比文献[15]中的结构节省14.3%的乘法器，4.9%到6.6%的加法器，以及10.9%到11.0%的延时单元。本文提出的8并行结构比文献[15]中的结构节省12.8%到13.0%的乘法器，-1.1%到3.9%的加法器以及10.8%到10.9%的延时单元。其中节省的加法器和延时单元的百分比和滤波器的抽头数有关，滤波器抽头数越大节省资源的百分比也越高。

表4本文提出结构和文献[15]中改进FFA结构的硬件资源消耗对比

并行度抽头数结构MRM(%)ARA(%)RD(%) Sub+P 372文献[15]960138+1714.216.4 本文96115+18 144文献[15]1920282+1715.416.5 本文192235+18 472文献[15]12614.3153+314.910.9 本文108136+39 144文献[15]25214.3315+316.611.0 本文216280+39 872文献[15]21112.8216+134-1.110.8 本文184192+162 144文献[15]41413.0459+1343.910.9 本文360408+162

5 有限字长性能分析

表6给出了本文结构和文献[15]中滤波器结构的有限字长性能对比(同样量化位宽下)。本文提出结构有更大的均方误差，主要原因是：本文结构的子滤波器前常系数分母较大且非2的幂次方，在量化滤波器系数时会引入更大的量化误差。但考虑提出结构能节省大量的硬件资源，有限字性能的适当下降是可以接受的。

表5 144抽头的滤波器所用硬件资源

表6 本文提出结构和文献[15]中改进FFA结构的均方误差对比

6 结论

本文展示了一种适用于线性相位FIR滤波器的改进的并行滤波器结构。本文提出的结构利用系数对称的特性和快速卷积算法来节省硬件资源。比较已有的并行FIR结构，本文提出结构的有限字长性能有一定下降，但可以节省较多的硬件资源，FIR滤波器的抽头数越大，节省的资源也越多。

[1] Parhi K K. VLSI Digital Signal Processing Systems: Design and Implementation[M]. New York: John Wiley & Sons, 2007: 237-275.

[2] Parker D A and Parhi K K. Low-area/power parallel FIR digital filter implementations[J].,, 1997, 17(1): 75-92.

[3] 邓军, 杨银堂. 全数字接收机中一种基于并行流水线与快速FIR算法的插值滤波器结构及其实现[J]. 电子与信息学报, 2010, 32(9): 2089-2094.

Deng Jun and Yang Yin-tang. Structure of interpolation filter based on parallel pipelining and fast FIR algorithm and its implementation for all digital receiver[J].&, 2010, 32(9): 2089-2094.

[4] Acha J I. Computational structures for fast implementation of-path and-block digital filters[J]., 1989, 36(6): 805-812.

[5] Cheng C and Parhi K K. Hardware efficient fast parallel FIR filter structures based on iterated short convolution[J].:, 2004, 51(8): 1492-1500.

[6] Cheng C and Parhi K K. Further complexity reduction of parallel FIR filters[C]. Proceedings of IEEE International Symposium on Circuits and Systems, Kobe, 2005: 1835-1838.

[7] Cheng C and Parhi K K. Low-cost parallel FIR filter structures with 2-stage parallelism[J].:, 2007, 54(2): 280-290.

[8] Aktan M, Yurdakul A, and Dundar G. An algorithm for the design of low-power hardware-efficient FIR filter[J].:, 2008, 55(6): 1536-1545.

[9] Shi D and Yu Y J. Design of discrete-valued linear phase FIR filters in cascade form[J].:, 2011, 58(7): 1627-1636.

[10] Park S Y and Meher P K. Low-power, high-throughput, and low-area adaptive FIR filter based on distributed arithmetic [J].:, 2013, 60(6): 346-350.

[11] Tsao Y C and Choi K. Hardware-efficient parallel FIR digital filter structures for symmetric convolutions[C]. Proceedings of IEEE International Symposium on Circuits and Systems, Rio de Janeiro, 2011: 2301-2304.

[12] Tsao Y C and Choi K. Hardware-efficient VLSI implementation for 3-parallel linear-phase FIR digital filter of odd length[C]. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Seoul, 2012: 998-1001.

[13] Liu Z, Ye F, and Ren J. Low-cost parallel FIR digital filter structures utilizing the coefficient symmetry[C]. IEEE 11th International Conference on Solid-State and Integrated Circuit Technology (ICSICT), Xi’an, 2012: 1-3.

[14] Tsao Y C and Choi K. Area-efficient VLSI implementation for parallel linear-phase FIR digital filters of odd length based on fast FIR algorithm[J].:, 2012, 59(6): 371-375.

[15] Tsao Y C and Choi K. Area-efficient parallel FIR digital filter structures for symmetric convolutions based on fast FIR algorithm[J].(), 2012, 20(2): 366-371.

[16] Selvakumar J, Narendran S, and Bhaskar V. FPGA based efficient fast FIR algorithm for higher order digital FIR filter[C]. International Symposium on Electronic System Design (ISED), Kolkata, 2012: 43-47.

田晶晶：男，1989年生，硕士生，研究方向为VLSI数字信号处理实现技术.

李广军：男，1950年生，教授，博士生导师，研究领域包括通信系统设计、ASIC/SOC设计、信号与信息处理.

李强：男，1979年生，教授，博士生导师，研究领域为数模混合集成电路.

Hardware-efficient Parallel Structures for Linear-phase FIR DigitalFilter Based on Iterated Short Convolution Algorithm

Tian Jing-jing①Li Guang-jun①Li Qiang①②

①(,,611731,)②(,,-8000,)

Based on fast convolution algorithm, improved parallel FIR filter structures are proposed for linear- phase FIR filters where the number of taps is a multiple of parallelism. The proposed parallel FIR structures not only use fast convolution algorithm to reduce the number of sub-filters, but also exploit the symmetric coefficients of linear-phase FIR filter to reduce half the number of multiplications in sub-filter section at the expense of additional adders in pre-processing and post-processing blocks. The proposed parallel FIR structures save a large amount of hardware cost for symmetric coefficients from the reported parallel FIR filter structures, especially when the length of the filter is large. Specifically, for a 4-parallel 144-tap filter, the proposed structure saves 36 multipliers (14.3%), 23 adders (6.6%), and 35 delay elements (11.0%) from the improved Fast FIR Algorithm (FFA) structure.

Parallel FIR filter; Fast convolution; Iterated short convolution; Symmetric coefficients

TN713.7

1009-5896(2014)05-1151-07

10.3724/SP.J.1146.2013.00976

田晶晶 jing.jing.t@163.com

2013-07-08收到，2013-11-08改回

国家自然科学基金(61006027)和新世纪优秀人才支持计划(NCET- 10-0297)资助课题