Fast Computing Scheme of DCT Coefficients for Image Processing

2010-02-08 19:33LIENChunhungLAIEugeneandCHANGWenching

电子科技大学学报 2010年5期

关键词：余弦移位运算

LIEN Chun-hung, LAI Eugene, and CHANG Wen-ching

(Dept. of Electrical Engineering, Tamkang University Taipei Chinese 25137)

Fast Computing Scheme of DCT Coefficients for Image Processing

LIEN Chun-hung, LAI Eugene, and CHANG Wen-ching

(Dept. of Electrical Engineering, Tamkang University Taipei Chinese 25137)

Because of the importance of the discrete cosine transform (DCT)in the field of image processing,various algorithms and architectures for 2-D DCT processor design have been proposed. In this paper, a novel fast computing mechanism for 2-D 8×8 DCT and quantization for JPEG or MPEG codec is presented. By an effective judging mechanism for an 8×8 image block, the algorithm will adjust DCT computing time depending on the distribution of an image block. This algorithm costs a few adders, shifters and comparators, but it reduces significantly the number of DCT computing times which dominates the computing performance of image processing. The simulation result shows that the proposed algorithm could save more DCT calculation than conventional DCT and integer DCT, and when quantization parameter is large, such as 32, the performance of proposed algorithm is better than that of small one.

discrete cosine transform; fast computing scheme; image processing; 2-D DCT

The discrete cosine transform (DCT)plays an important role in many image and video processing applications[1], especially in the mobile environment where the limited computing power or battery supply is a significant concern. In mobile or portable applications, the cost of multiplications is typically much more expensive than other operations such as additions, subtractions and binary shifts.

1 Introduction

There are many papers discussing the fast implementation of the 1D and 2-D DCT. One category of papers use the method based on the decomposition[2-5].Another group of papers use the so-called direct method which has lower computation complexity but leads to slightly irregular structures[6-9]. Recently,several systolic implementations of DCT processors have been proposed[10-11]. These architectures have a high throughput rate and parallel output.

In this paper, we will present a new algorithm and its corresponding architecture with low arithmetic hardware complexity for fast computation of 2-D 8×8 DCT. The kernel architecture only consists of several adders, shifters and comparators.

Figure 1 Block diagram of an image encoder

2 Overview of DCT in Image Processing

The DCT is a transform that can reduce the spatial redundancy and is known to have better energy compaction performance than other transforms[1]. The definition of one dimensional DCT is shown below as Eq. (1)[12]:

The inputted 8×8 pixel blocks are transformed by 2-D DCT to generate 8×8 DCT coefficients, which are quantized for compression. The block diagram is shown in Fig. 1. If we define x(n,m), 0≤n≤7,0 ≤ m ≤ 7 , as pixel-values in an 8×8 pixel block before the DCT, the 2-D 8×8 DCT coefficients F(k,l),0≤k≤7,0≤l≤7, can be computed by Eq. (3).

After the DCT, the DCT coefficients are quantized. These quantized DCT coefficients are scanned in a zig-zag scanning order, as shown in Fig. 2.The zig-zag scan converts the 2-D 8×8 DCT coefficients into a one-dimensional sequence in an approximately ascending spatial frequency order. Since many high-frequency DCT coefficients will be quantized to zeros, there is usually a long stream of zeros at the end of the sequence. This is represented by an end-of-block (EOB)symbol after the last nonzero coefficient to indicate that after this position, all the DCT coefficients in the block are zeros. Fig. 3 shows the distribution of EOB for different quantization parameters. From Fig. 3, we can see that the EOB position decreases when the quantization parameter increases because more DCT coefficients are quantized to zero.

Figure 2 Zig-zag scanning order

This result shows that when the step size of quantization is large, we can be more aggressive in setting the high-frequency DCT coefficients to zeros.

Figure 3 Distribution of EOB for Lena

3 Proposed Fast DCT Algorithm

Depending on the characteristic of the EOB location, we can estimate the EOB location in advance by judging the complexity of input image. If we know the EOB location in advance, we can just calculate several DCT coefficients and neglect high frequency part. The steps shown below exhibit how our proposed algorithm saves the computational effort in DCT.

(3)After the SAD of an 8x8 block is found,compare it with each threshold value to determine which interval the block belongs. The different interval represents the different pixel distribution of an 8x8 block, and the lower threshold value represents the smoother distribution of a block.

(4)From Fig. 4, by judging which interval the block belongs to, the number of calculation times of DCT and quantization can be determined.

Figure 4 Block diagram of the proposed algorithm

If the block is with low complexity, we may just calculate some DCT coefficients, but if the block is with high complexity, we should calculate most of the DCT coefficients by the judgment of our algorithm.

4 Analysis and Discussion

In this paper, 3 quantization values are used for testing the proposed algorithm, and the values are 8, 16 and 32, respectively. In order to evaluate the performance of the proposed algorithm, we use the accumulated times of addition/subtraction,multiplication and shift/round. To compare with other operations of DCT and quantization, the times of multiplication must be particularly mentioned due to its high complexity and long processing time. We use the integer DCT proposed by Ref. [13], which plays an important role in H.263, for the comparison. This integer DCT already reduces much computing complexity from the conventional DCT.

PSNR is also adopted for determining the loss of image quality since the image quality may be reduced in the proposed algorithm, where PSNR is defined as Eq (7).

Table 1 Simulation result of Lena with quantization parameter of 8

where MSE(mean square error)is shown below:

where Xiis the original pixel and Xi' is the pixel after processed by the proposed algorithm.

The PSNR results of Conventional DCT, Integer DCT and Proposed Algorithm are 36.752 dB, 36.901 dB and 36.898 dB respectively.

Table 1 is the simulation result of several algorithms under quantization parameter of 8 and the sample image is the commonly used Lena. It shows that our proposed algorithm saves most in multiplication among these 3 algorithms.

If we use larger quantization parameter, such as 16 or 32, as shown in Fig. 5, it is obvious that our algorithm saves more computing power in multiplication. From the simulation result of PSNR, we also find that the image quality does not degrade significantly. Fig. 6 shows the percentage of saved multiplication with 3 different quantization parameters of 8, 16 and 32 by using the commonly used 4 samples.

Figure 6 Percentage of saved multiplication vs.3 quantization parameters

When quantization parameter is large, the proposed algorithm can save more in multiplication. In addition to the quantization parameter, the distribution of the image also affects the performance of the proposed algorithm. If the content of an image is complicated, the proposed algorithm may save less in multiplication than the simple one.

5 Conclusion

In this paper, we propose a new algorithm to save the computation in DCT and quantization. Based on the simulation result, the proposed algorithm can substantially reduce the calculation complexity of 2-D 8x8 DCT. The performance of the proposed algorithm is also concerned with the content of image sample.

Besides, based on the simulation result, the degradation of image quality in this algorithm is negligible. But if the image quality is an issue, we can easily adjust the threshold value for obtaining high output image quality. The architecture of the proposed algorithm is effective and practical by using adder,shifter and comparator.

[1]RAO K R, YIP P. Discrete Cosine Transform- Algorithms,Advantage, Applications[M]. New York: Academic Press,1990.

[2]AHMED N. NATARAJSN T, Rao K R. Discrete cosine transform[J]. IEEE Trans Computers, 1974, 23(1): 90-93.

[3]LI W. A new algorithm to compute the DCT and its inverse.IEEE Trans Signal Processing, 1991, 39(6): 1305-1313.

[4]CHAITALI C, JOSEPH J. Systolic architectures for the computation of the discrete Hartley and the discrete cosine transforms based on prime factor decomposition[J]. IEEE Trans Computers, 1990, 39(11): 1359-1368.

[5]SHING C C, KA L H. Fast algorithms for computing the discrete cosine transform[J]. IEEE Trans Circ Syst, 1993,4 4:185-190.

[6]LIU K J R, CHIU C T, KOLAGOTAL K. Optimal unified architecture for the real-time computation of time-recursive discrete sinusoidal transforms[J]. IEEE Trans Circ Syst for Video Tech, 1994, 4(2): 168-180.

[7]CHO N I, LEE S U. Fast algorithm and implementation of 2-D discrete cosine transform[J]. IEEE Trans Circuits System. 1991, 38: 297-305.

[8]DUHAMEL P, GUILLEMOT C. Polynomial transform computation of 2-D DCT[C]//Proc Int Conf Acoustics,Speech, Signal Processing. Albuquerque, NM, USA: IEEE,1990: 1515-1518.

[9]WU H R, PAOLOUI F J. A two-dimensional fast cosine transform algorithm based on Hou’s approach[J]. IEEE Trans Signal Processing, 1991, 39(2): 544-546.

[10]WEIZHEN M. A novel systolic array implementation of DCT DWT and DFT[C]// IEEE Region 10 Conference on Computer and Communication Systems. Guangzhou. IEEE,1990: 211-215.

[11]CHANG L W, WU M C. A unified systolic array for discrete cosine and sine transform[J]. IEEE Trans Signal Processing, 1991, 39: 192-194.

[12]DIMITROV V S, JULLIEN G A, MILLER W C , A new DCT algorithm based on encoding algebraic integers[C]//Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, [S.l]: [s.n], 1998:1377-1380.

[13]SHIN Kuei-tsong, TSAI Chia-yang, HANG Hsueh-min.Real-time implementation of H.263+ using TI TMS320C6201 digital signal processor[C]//Proceedings of the 2003 International Symposium on Circuits and Systems,[S.l]: IEEE, 2003, 2: 900-903.

编辑张俊

2009- 03- 24;

2010- 01- 26

连俊宏(1976- )，男，博士，主要从事影像压缩及影像处理相关算法、架构及软硬件设计方面的研究.

针对影像处理的快速二维离散余弦转换算法

连俊宏，赖友仁，张文清

(淡江大学电机工程学系中国台北 25137)

由于离散余弦转换在影像处理领域之重要性与日俱增，且消耗许多处理器运算时间，所以众多快速二维离散余弦转换算法不断被发表。该文提出一个应用于JPEG及MPEG图像处理的快速二维8×8离散余弦转换算法，该算法主要运用基本的累加及移位运算，快速评估8×8影像区块的复杂程度，可调整离散余弦转换参数的计算数量。该算法只需花费少量硬件成本，如比较器、加法器、移位器，便可有效降低离散余弦转换运算时间，且在模拟结果显示所提出的算法与传统及整数离散余弦转换相比，可达到较快的运算速度，且在量化系数较大的情况下，可得到更好效果。

散余弦转换; 快速转换算法; 影像处理; 二维离散余弦转换

TN301.6

10.3969/j.issn.1001-0548.2010.05.010

Figure 5 Percentage of saved multiplication of integer DCT and proposed algorithm when using Lena

date：2009 – 03 – 24；Revised date：2010 – 01 – 26

Biofraphy：LIEN Chun-hung was born in 1976, male, doctor, his research interests include algorithms, architecture, and software/hardware co-design of video compression and video processing systems.