An increasing number of services and growing popularity of high definition videos are creating much more need for higher coding efficiency. The analysis of design and performance metrics like peak signal-to-noise ratio and subjective quality test evaluate those HEVC encoders can achieve approximately 50% bit rate reduction than H.264/MPEG-4 AVC. The HEVC design is shown to be especially effective for low bit rates, high–resolution video content and low-delay communication applications. This paper also provides an overview and summarizes emerging studies on the coding features of H.262/MPEG-2 video, H.263, MPEG-4 visual.
Coding efficiency is the ability to maximize the video quality achievable within a given available bit rate. The goal of this paper is to analyze the coding efficiency that can be achieved by the use of the emerging High Efficiency Video Coding (HEVC) standard [1]–[3], relative to the coding efficiency characteristics of its major predecessors including H.262/MPEG-2 Video [4], H.263 [5], MPEG-4 Visual [6], and H.264/MPEG-4 Advanced Video Coding (AVC) [7] - [8]. The emerging HEVC design is analyzed using a systematic approach that is largely similar in spirit to that of previously applied to the analysis of the first version of H.264/MPEG-4 AVC in [9]. A major emphasis in this analysis is the application of a disciplined and uniform approach for optimization of each of the video encoders.
The High Efficiency Video Coding (HEVC) standard is the most recent joint video project of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) of standardization organizations, working together in a partnership known as the Joint Collaborative Team on Video Coding (JCT-VC) [2].The major video coding standard directly preceding the HEVC project was H.264/MPEG-4 AVC, which was initially developed in the period between 1999 and 2003, and then it was extended in several important ways from 2003–2009. H.264/MPEG-4 AVC has been an enabling technology for digital video in almost every area that was not previously covered by H.262/MPEG-2 Video and has substantially displaced the older standard within its existing application domains. It is widely used for many applications, including broadcast of High Definition (HD) TV signals over satellite, cable, and terrestrial transmission systems, video content acquisition and editing systems, camcorders, security applications, Internet and mobile network video, Blu-ray Discs, and real-time conversational applications such as video chat, video conferencing, and telepresence systems. HEVC has been designed to address essentially all existing applications of H.264/MPEG-4 AVC and to particularly focus on two key issues: increased video resolution and increased use of parallel processing architectures. As has been the case for all past ITU-T and ISO/IEC video coding standards, in HEVC only the bit stream structure and syntax is standardized, as well as constraints on the bit stream and its mapping for the generation of decoded pictures.
The paper is organized as follows. Section 1 describes the review of the investigated video coding standards and highlights the main coding tools that contributes to the coding efficiency improvement from one standard generation to the next. The proposed approach that is described is in Section 2. In Section 3, the current performanance of the HEVC reference implementation and, a few experimental results are presented, and Section 4 includes conclusion.
The basic design of all major video coding standards since H.261 (in 1990) [10] follows the so-called block-based hybrid video coding approach. Each block of a picture is either intrapicture coded (also known as coded in an intra coding mode), without referring to other pictures of the video sequence, or it is temporally predicted (i.e., inter-picture coded, also known as coded in an inter coding mode), where the prediction signal is formed by a displaced block of an already coded picture. The latter technique is also referred to as motion-compensated prediction and represents the key concept for utilizing the large amount of temporal redundancy in video sequences. The prediction error signal (or the complete intra-coded block) is processed using transform coding for exploiting spatial redundancy. The transform coefficients that are obtained by applying a decorrelating (linear or approximately linear) transform to the input signal are quantized and then entropy coded together with side information such as coding modes and motion parameters.
H.262/MPEG-2 Video was developed as an official joint project of ITU-T and ISO/IEC JTC 1. It was finalized in 1994 and it is still widely used for digital television and the DVD Video optical disc format. Similarly, as for its predecessors H.261 [10] and MPEG-1 Video [11], each picture of a video sequence is partitioned into Macro Blocks (MBs), which consists of a 16 × 16 luma block and in the 4:2:0 chroma sampling format, two associated 8 × 8 chroma blocks. This standard defines three picture types: I, P, and B pictures. I and P pictures are always coded in display/output order. The most widely implemented profile of H.262/MPEG-2 Video is the Main Profile (MP). It supports video coding with the 4:2:0 chroma sampling format and includes all tools that significantly contribute to coding efficiency.
The first version of ITU-T Rec. H.263 [5] defines syntax features that are very similar to those of H.262/MPEG-2 Video, but it includes some changes that make it more efficient for low-delay low bit-rate coding. The coding of motion vectors has been improved by using the component-wise median of the motion vectors of three neighboring previously decoded blocks as the motion vector predictor. The transform coefficient levels are coded using a 3-D run-level-last VLC, with tables optimized for lower bit rates. The H.263 profiles that provide the best coding efficiency are the Conversational High Compression (CHC) profile and the High Latency profile (HLP). The CHC profile includes most of the optional features (annexes D, F, I, J, T, and U) that provides enhanced coding efficiency for low-delay applications.
MPEG-4 Visual [6], also known as Part 2 of the MPEG-4 suite, is backward-compatible to H.263 in the sense that each conforming MPEG-4 decoder must be capable of decoding H.263 Baseline bit streams (i.e., bit streams that use no H.263 optional annex features). Similarly as for annex F in H.263, the inter prediction in MPEG-4 Visual can be done with 16 × 16 or 8 × 8 blocks. MPEG-4 Visual [6], also known as Part 2 of the MPEG-4 suite, is backward-compatible to H.263 in the sense that each conforming MPEG-4 decoder must be capable of decoding H.263 Baseline bit streams (i.e., bit streams that use no H.263 optional annex features).
Similarly as for annex F in H.263, the inter prediction in MPEG-4 Visual can be done with 16 × 16 or 8 × 8 blocks. For the comparisons in this paper, the authors used the Advanced Simple Profile (ASP) of MPEG-4 Visual, which includes all relevant coding tools. They generally enabled quarter-sample Precision motion vectors.
H.264/MPEG-4 AVC [7], [8] is the second video coding standard that was jointly developed by ITU-T VCEG and ISO/IEC MPEG. It still uses the concept of 16×16 MBs, but contains many additional features. One of the most obvious differences from older standard is its increased flexibility for inter coding. H.264/MPEG-4 AVC also supports multiple reference pictures. In general, motion vectors are predicted by the component-wise median of the motion vectors of three neighboring previously decoded blocks. Instead of I, P, and B pictures, the standard actually specifies of I, P, and B slices. A picture can contain slices of different types and a picture can be used as a reference for inter prediction of subsequent pictures, independently of its slice coding types. This generalization allowed the usage of prediction structures such as hierarchical B pictures that show improved coding efficiency compared to the IBBP coding typically used for H.264/MPEG-2 Video.
For entropy coding of all MB syntax elements, H.264/ MPEG-4 AVC specifies two methods. The first entropy coding method, which is known as Context-Adaptive Variable-Length Coding (CAVLC), uses a single codeword set for all syntax elements except the transform coefficient levels. However, the efficiency is improved by switching between VLC tables depending on the values of previously transmitted syntax elements. The second entropy coding method specifies Context-Adaptive Binary Arithmetic Coding (CABAC) by which the coding efficiency is improved relative to CAVLC. The High Profile (HP) of H.264/MPEG-4 AVC includes all tools that contribute to the coding efficiency for 8-bit-persample video in 4:2:0 format, and is used for the comparison in this paper.
In HEVC [17], a picture is partitioned into Coding Tree Blocks (CTBs). The size of the CTBs can be chosen by the encoder according to its architectural characteristics and the needs of its application environment, which may impose limitations such as encoder/decoder delay constraints and memory requirements. The luma CTB and the two chroma CTBs, together with the associated syntax, form a Coding Tree Unit (CTU). The CTU is the basic processing unit of the standard to specify the decoding process (conceptually corresponding to an MB in prior standards). The blocks specified as luma and chroma CTBs can be further partitioned into multiple Coding Blocks (CBs). The size of the CB can range from the same size as the CTB to a minimum size (8×8 luma samples or larger) that is specified by a syntax element conveyed to the decoder. The luma CB and the chroma CBs, together with the associated syntax, form a Coding Unit (CU). Similar to H.264/MPEG-4 AVC, HEVC supports quarter sample precision motion vectors.
In general, the transforms represents the integer approximations of a DCT. For luma intra TBs of size 4 × 4, an alternative transform representing an approximation of a discrete sine transform is used. profile (MP). It includes all the coding tools as described above and supports the coding of 8-bit-per-sample video in 4:2:0 chroma format. For some comparisons in this paper, the authors used a modified configuration, where some coding tools are disabled.
The main focus of the comparison is on investigate the coding efficiency that is achievable by the bitstream syntax. In this paper PSNR versus the bit-rate measurements are presented, comparing the coding efficiency of the capabilities of HEVC and H.264/MPEG-4 AVC when encoding using the Langrangian-based optimization techniques. The objective of this paper is to optimize the coding efficiency of HEVC compared to H.264/MPEG-4 AVC. All sequences are progressively using the YUV 4:2:0 color format with 8 bit per color sample.
The task of an encoder controls the particular coding standard is to determine the values of the syntax elements, and thus the bit stream b, for a given input sequence s in a way that the distortion D(s, s_) between the input sequence s and its reconstruction s'=s' (b) is minimized subject to a set of constraints, which usually includes constraints for the average and maximum bit rate and the maximum coding delay. The concept of the described Lagrangian encoder control is applied for mode decision, motion estimation, and quantization. The minimization of a Lagrangian cost function for mode decision was proposed in [12], [13]. For the investigated encoders, the described mode decision process is used for the following
The minimization of a Lagrangian cost function for motion estimation was proposed in [14], [15]. In HEVC, the motion vector predictor for a block is not fixed, but can be chosen out of a set of candidate predictors. The used predictor is determined by minimizing the number of bits required for coding the motion vector m.
For all the results presented in this paper, the Quantization Parameter QP and the Lagrange multiplier λ are held constant for all MBs or CUs of a video picture. The Lagrange multiplier is set according to = α • Q^2 where Q denotes the quantization step size, which is controlled by the Quantization Parameter QP ([16]).
The first experiment addresses interactive video applications such as video conferencing. The authors selected six test sequences with typical video conferencing content. The average bit-rate savings between the different codecs, which are computed over the entire test set and the investigated quality range, are summarized in Table 1. These results indicates that the emerging HEVC standard clearly outperforms its predecessors in terms of coding efficiency for interactive applications.
Besides the interactive applications, one of the most promising application areas for HEVC is the coding of high-resolution video with entertainment quality. The bit-rate savings results, averaged over the entire set of test sequences and the examined quality range, are summarized in Table 2.
Table 1. Averave Bit –rate Savings For Equalpsnr For Interactive Applications
Table 2. Average Bit-rate Savings For Equal PSNR For Entertainment Applications
The subjective test results are further analyzed to obtain a finer and more precise measure of the coding performance gains of the HEVC standard. There are a set of four MOS values per sequence per codec. By comparing these bit rates at the same MOS values, the bitrate savings achieved by HEVC relative to H.264/MPEG-4AVC can be calculated for any given MOS values. An example is shown in Figure 1. These graphs shows the bit-rate savings for the HEVC MP relative to the H.264/MPEG-4AVC HP at different MOS values.
Table 3 shows the computed bit-rate savings of the HEVC MP relative to H.264/MPEG-4 AVC HP. The savings ranges from around 30% to nearly 67%, depending on the video sequence. The average bit-rate reduction over all the sequences tested was 49.3%.
Figure 1. Bit-rate savings as a function of subjective quality
Table 3. Average Bit-rate For Entertainment Applications cenario Based On Subjective Assessment Results
The emerging HEVC standard can provide a significant amount of increased coding efficiency compared to previous standards, including H.264/MPEG-4 AVC. The syntax and coding structures of the various tested standards are explained and the associated Lagrangian-based encoder optimization is described. Finally, results of subjective tests are provided comparing HEVC and H.264/MPEG-4 AVC, and indicating that a bit-rate reduction can be achieved for the example video test set by about 50%.