LU Qian-li(陆千里),CEN Feng(岑 锋)*,XU Wei-sheng(许维胜),ZHU Fang-lai(朱芳来)
1 The Key Laboratory of Embedded System and Service Computing,Ministry of Education,Tongji University,Shanghai 200092,China
2 Department of Control Science& Engineering,Tongji University,Shanghai 200092,China
The hierarchical structure is an important coding tool in video coding.It usually shows better coding efficiency than the conventional“IPPP”or“IBBP”structure[1,2].Furthermore,the hierarchical structure with proper management of multiple reference pictures can be utilized to provide temporal scalability,such as scalable video coding(SVC)[2,3].
Rate control(RC),as an integral part of various practical video codecs,is employed to achieve good video quality consistently under the channel bandwidth and buffer constraints.It usually includes two steps:bit allocation and bit allocation achievement,i.e.,quantization parameter(QP)determination.To achieve the optimal rate-distortion tradeoff,most approaches to bit allocation have been reported in the literatures.However,for the hierarchical structure,the appropriate rate-quantization(R-Q)model adopted to determine QP is not developed explicitly.Li et al.[4]proposed an RC scheme for H.264/AVC.In their scheme,the target bits are determined by the number of remaining bits according to the basic unit and the buffer occupation.Then a quadratic R-Q model is proposed to calculate QP for the current basic unit.Since the actual mean absolute difference(MAD)of the basic unit,as the complexity measure in the model,has to be obtained after the estimation of motion vectors,there exists a chicken and egg dilemma.To solve the dilemma,a linear model is employed to estimate the MAD value for the R-Q model.Lim et al.[5]proposed an RC algorithm which employed a ρ-domain source modelto determine the QP value.In spite of the simplicity and efficiency of the algorithms in Refs.[4,5],the hierarchical structure is not explicitly considered.Based on the approach in Ref.[4],Leontaris et al.[6]proposed four benchmark rate control modes as the H.264 reference models.Considering the hierarchical structure,Mode 3 showed improvement in total bit allocation for the hierarchical structure.However,there are two issues in the algorithm.First,the parameters of R-Q model were updated based on the characteristics of previous P frames.With the model parameters updated from P frames,the QP determination for B frames,especially B frames in the high temporal level(TL),was inaccurate.Second,they overlooked the difference of the characteristics of frames in different slice types or TLs,and employed a linear MAD model to estimate the MADs for all slice types and TLs.Li et al.[7]proposed an algorithm to allocate the target bits for a frame by considering the TL.However,the R-Q model is the same quadratic R-Q model with Refs.[4,6],and the rate-distortion(R-D)performance of bit allocation approach is worse than that in Ref.[6].In Ref.[8],Seo et al.proposed bit allocation schemes in frame-level or macroblock(MB)-level and a new R-Q model to improve the coding efficiency.Based on the rate-MAD(R-MAD)model in Ref.[9],they proposed the R-Q model using variance of difference(VOD)and MAD to measure the content complexity.The potential problems are that the R-Q model does not consider the difference of frames in different slice types or TLs,and requires a lot of computation.
As mentioned above,the R-Q models adopted to determine QP in the state-of-the-art are directly derived from the statistical characteristics of I/P frames[6]or all the frames[8]in the video sequence,and do not take the difference of frames in different slice types or TLs into account.To overcome the limitation of the conventional schemes,an independent R-Q modelis presented for the hierarchical structure in this work.For the independent R-Q model,a novel MAD estimation algorithm is developed to predict the MAD for the current frame in the hierarchical structure.The experimental results demonstrate that the proposed RC can achieve better R-D performance than the original algorithm[6].
The proposed algorithm for the hierarchical coding structure is composed of two techniques:(1)MAD estimation,and(2)independent R-Q model for B frames in the hierarchical structure.
To illustrate the relationship between MADs of P and B frames in different TLs,the MAD curve of“Mobile &Calendar”in the picture order count(POC)is shown in Fig.1.The frame rate is 30 fps(frame per second),QP is fixed to 24,and the hierarchical group of pictures(HGOP)size is set to 8.Figure 2 represents the structure of HGOP 8,where B frames in low TLs are employed as reference frames for frames in high TLs.
From Fig.1,it is easy to observe that the lower the TL is,the larger the average MAD is.Moreover,in a certain HGOP,the MADs of B frames in the same TL are similar,and there is a linear relationship in between.Therefore,the MADof the kth B frame in the lth TL can be estimated based on theMAD of the P frame in the HGOP as follows:
Fig.1 MAD curve of“Mobile & Calendar”(QP=24)
In contrast,if the current frame is not the first B frame in the HGOP,the MADs of preceding B frames in the same TL should also be considered as follows:
To sum up,the MAD of the kth B frame in the lth TL(denotes as)is estimated as follows:
whereClis a weighting factor in the lth TL,and empirically set to 0.1.
Fig.2 Hierarchical structure where HGOP size is 8
Figure 1 also shows that with the constant QP,the MADs of P frames are similar.So the MADs of P frames are determined by a linear model in TL 0:
where δpdenotes the predicted MAD of the P frame in the HGOP,and c1and c2denote two modelparameters,respectively.They are updated after encoding each P frame.
In conclusion,with Eqs.(3)and(4),the MADs can be estimated for P frame and B frames in the HGOP.
Benchmark RC algorithms in H.264 usually use the quadratic R-Q model to determine the quantization step(Qs)size,which doubles in size for every increment of 6 in QP.In Ref.[6],the RC algorithms for the hierarchical B frame structure were proposed to control the bitrate with the R-Q model as follows:
where T and δ denote the texture bits and estimated MAD,respectively.Xp,1and Xp,2denote two model parameters,which are updated with the sliding-window management process after coding the P frame in the hierarchical structure.The QP values of the B frames are determined by the quadratic R-Q model derived from the characteristics of P frames.
Through the observation on a large number of video sequences,which are encoded with constant QP,we notice that there exists a linear relationship between MAD and generated bits for B frames in the same TL,and the linear relationships are independent in different TLs.Figures 3(a)and(b)show the relationships between MAD and generated bits for P frames and B frames.The video sequences are“Mobile & Calendar”and“Foreman”(CIF size),and 289 frames are encoded for both.The simulation conditions for both sequences are the same as those in Fig.1.In Fig.3,the symbols of P and B frames in three TLs gather in different districts,and the larger the MAD value is,the higher the bits generated for a picture in each TL is.The lines stand for the corresponding linear models obtained with the least square fitting for each set of symbols in one TL.Here,Eq.(6)is employed to approximate the R-MAD relationship for each TL:
Although the constant QP ranges from 16 to 36 by intervals of 2,Fig.4(a)shows the R-MAD relationship of P frames corresponding to various QP.From Figs.4(b)-(d),we notice that the B frames in each TL have similar R-MAD distribution with P frames.In Refs.[4,6],the R-Q model for P frames is described as:
where Tp,δp,Xp,1,and Xp,2are the target bits,estimated MAD,and two parameters for a P frame,respectively.In contrast to Eq.(6),can be modeled by
Consequently,the independent model for B frames in each TL is described as follows:
It should be mentioned that besides the R-Q model parameters of the B frames,the model parameters of P frames should be also updated independently according to the characteristics of previous P frames.
The proposed algorithm is compared with Mode 2 and Mode 3[6](RC_MODE_2 and RC_MODE_3)RC algorithms adopted in JM 14.2[10],which are the benchmark schemes modified based on JVT-G012[4].For Seo's RC algorithm[8],the R-Q model parameters σ,ξ1,and ξ2should be fixed before the model updated.However,the appropriate values of these model parameters are not explicitly supplied for different sequences and different resolutions in Ref.[8].So we did not compare our proposed algorithm with Seo's.
Table 1 illustrates the codingresults on twosets of sequences.Thefirstsetconsists ofthe CIF sequences:“Foreman”,“Mobile & Calendar”,and“News”.The second set consists of the 720P sequences:“Crew”and“City”.These video sequences have different motion types from low to high and different spatial details from simple to complex.The video sequence has 231 frames to be encoded,the GOP structure is HGOP 8,the IDR period is 32 frames,and the frame rate is 30 fps.The motion vector search range and the number of multiple reference frames for motion estimation are set to ±32 and 5,respectively.The context-adaptive binary arithmetic coding(CABAC)and R-D optimization are enabled.The initial QP is set identically.For the sequences in 720P size,the motion estimation uses the UMHexagonS[11]mode with the search range of 48 and the number of reference frames is 2.Since the QP determination of hierarchical B frames is not based on the RQ model,RC_MODE_2 is difficult to set the QP for B frames according to the target bitrate.It generates the bitstream with a large target bits mismatch.Conversely,RC_MODE_3 is able to maintain the bitrate accurately according to the target budget.The proposed RC algorithm can achieve better R-D performance than the RC_MODE_3 with the average PSNR gain is 0.408 dB in CIF format,and with gain 0.188 dB for sequences in 720P while keeping the same bitrates.Table 1 shows that in most cases,the proposed algorithm is more effective than RC_MODE_2 and RC_MODE_3.
Table 1 Performance of RC algorithms in terms of bitrate and PSNR
Figure 5 also shows the experimental results for“Paris”in CIF format.Because of the large target bits mismatch,RC_MODE_2 is not included in comparison.The simulation conditions for“Paris”in Fig.5 are the same as those in Table 1.Figure 5(a)shows PSNR versus frames at 512 kB/s.Obviously,the proposed approaches provide smoother performance.Figure 5(b)shows the R-D curves achieved from the proposed algorithm and RC_MODE_3,respectively.A roughly 0.253 dB improvement is achieved by the proposed rate algorithm.In conclusion,our proposed algorithm can provide better R-D performance.
Fig.5 Comparisons of bitrate and performance of“Paris”
In this article,we proposed a simple but effective algorithm to improve the RC for hierarchical picture structure.The algorithm employed a novel MAD estimation algorithm and an independent R-Q model to determine QP for B frames in different TLs.Our simulation results showed that our proposed algorithm could improve the coding efficiency and reduce the fluctuation ofvisualquality. Furthermore,the proposed improvements could be extended to a complete RC scheme by setting more appropriate bit budget for different slices in the hierarchical structure reasonably and estimating the complexity more accurately in MB-level.
[1]Schwarz H,Marpe D,Wiegand T.Analysis of Hierarchical B Pictures and MCTF[C].IEEE International Conference on Multimedia and Expo,Toronto,Canada,2006:1929-1932.
[2]Schwarz H,Marpe D,Wiegand T.Overview of the Scalable Video Coding Extension of the H.264/AVC Standard[J].IEEE Transactions on Circuits and Systems for Video Technology,2007,17(9):1103-1120.
[3]Huang H C,Peng W H,Chiang T H,et al.Advances in the Scalable Amendment of H. 264/AVC [J]. IEEE Communications Magazine,2007,45(1):68-76.
[4]Li Z G,Gao W,Pan F,et al.Adaptive Rate Control for H.264[J].Journal of Visual Communication and Image Representation,2006,17(2):376-406.
[5]Lim S C,Na H R,Lee Y L.Rate Control Based on Linear Regression for H.264/MPEG-4 AVC [J].Signal Processing:Image Communication,2007,22(1):39-58.
[6]Leontaris A,Tourapis A M.Rate Control Reorganization in the Joint Model(JM)Reference Software[C].Joint Video Team of ISO/IEC MPEG and ITU-T VCEG,JVT-W042,San Jose,California,2007.
[7]Li M,Chang Y L,Yang F Z,et al.Frame Layer Rate Control for H.264/AVC with Hierarchical B-Frames[J].Signal Processing:Image Communication,2009,24(3):177-199.
[8]Seo C W,Kang J W,Han J K,et al.Efficient Bit Allocation and Rate Control Algorithms for Hierarchical Video Coding [J].IEEE Transactions on Circuits and Systems for Video Technology,2010,20(9):1210-1223.
[9]Xie B,Zeng W J.A Sequence-Based Rate Control Framework for Consistent Quality Real-Time Video[J].IEEE Transactions on Circuits and Systems for Video Technology,2005,16(1):56-71.
[10]Joint Model Reference Software Version 14.2[CP/OL].(2011-01-06)[2011-08-01].http://iphome.hhi.de/suehring/tml/.
[11]Rahman CA,BadawyW.UMHexagonSAlgorithm Based Motion Estimation Architecture for H.264/AVC [C].Proceedings of the 5th International Workshop on System-on-Chip for Real-Time Applications,Banff,Canada,2005:207-210.
Journal of Donghua University(English Edition)2012年3期