Ulusal Tez Merkezi

Tez No	İndirme	Tez Künye	Durumu
402041		Reduced-complexity disparity estimation for efficient multiview imagery encoding / Yazar:AYKUT AVCI Danışman: PROF. DR. HERBERT DE SMET ; DR. JAN DE COCK Yer Bilgisi: Ghent University / Yurtdışı Enstitü Konu:Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol = Computer Engineering and Computer Science and Control ; Bilim ve Teknoloji = Science and Technology Dizin:	Onaylandı Doktora İngilizce 2013 163 s.



3D display technology has vvitnessed a rapid development in the past decades. Currently 3D displays are being vvidely used in different application areas such as education, broadcasting, entertainment, surgery, video conferencing, ete. However, this technology owes its success to the other complementary technologies such as image acquisition, compression and transmission. The compression is the main topic of this dissertation. In multivievv displays, the realism of the reproduced 3D scene is depen-dent on the number of available vievvs that the displays can show. These vievv images can be captured from different vievvpoints of a scene by using a camera array. A smoother transition between vievvs can be obtained by inereasing the number of cameras located in the camera array. Hovvever, this comes at the price of an inereased amount of image data vvhich needs to be encoded (compressed) to store and transmit the data efRciently. The captured multivievv videos for different vievvs can be encoded sep-arately by a state-of-the-art video codec ilke H.264/AVC, vvhich is called simulcast coding. Although coding each video individually is an easy op-tion to solve the problem, it is not the most efRcient approach since the inter-vievv correlations betvveen vievvs are overlooked. İn order to improve the efRciency of the encoder, the inter-vievv correspondences can be taken into account. Hovvever, in this case, the computational load of the encoder becomes very high, especially if the camera array is two dimensional, i.e. if vertically spaced vievvs are also being captured. Limitations on processing povver, memory requirement and the desirability of features l'ıke instant access to specific vievv frame in the multivievv video may render this seheme unusable. A periodic (2D) camera array results in a strong geometrical relation-ship among the captured vievv images. This fact forms the core of the methods I propose in this dissertation, vvhich consequently reduces the complexity of the multivievv encoder significantly. The P and B frames are well-known frame types from the H.264/AVC video coding Standard and are instrumental in the motion estimation pro-cess that exploits the similarity betvveen consecutive frames in the time domain. A similar process called disparity estimation can be used to ex-ploit the similarity between vievvs. Since the B frame offers bi-directional prediction, it shows a better coding performance than the P frame but brings high computational load to the encoder. The complexity efRcient versions of these frame types, named the Dp and Db frame respectively, are introduced in this dissertation. The disparity estimation process is the most complex and time con-suming part of the encoder. The Dp frame achieves a significant complex-ity reduction by skipping the disparity estimation process for some of its blocks. The skipping process is entirely based on the fact that the disparity vectors of the blocks in a Dp frame can for most blocks be derived from the previously encoded blocks in another frame due to the strong geometrical relationship betvveen vievvs. A derived disparity vector needs to be checked for its fidelity since the derivation process can fail in some blocks due to occlusions, anisotropic illumination effects or insufRcient tex-ture information. To do this, the rate-distortion cost value of the derived disparity vector is compared with a threshold value. The blocks vvhose RD cost value of the derived disparity vector is lovver than the threshold value will be exempted from the disparity estimation process, vvhich results in a net complexity gain for the encoder. The threshold value plays a crucial role on the determination of con-venience of the derived disparity vectors. Since the Dp frame is a modified standard-conforming P frame, the encoder shovvs the same coding performance as the P frame if the threshold value is set equal to zero. İt means that none of the derived disparity vectors are used and the disparity estimation will be performed for ali the blocks in the frame. As the threshold value increases, the complexity of the encoder decreases, vvhile the qual-ity and bit-rate of the encoded images degrade. Therefore, the threshold value is a parameter to adjust the trade-off betvveen the complexity and the rate-distortion performance of the encoder. I introduced five alternative prediction schemes to encode 5x3 view images taken from a 2D camera array, three of vvhich were constructed with the Dp frame. İt has been noticed that the different locations of the frames have influence on the rate-distortion performance of the encoder. The prediction scheme in vvhich the I frame is placed in the middle gives the best performance. After investigating the impact of a wide range of threshold values on the encoding performance and the complexity, it has been realized that the complexity of the encoder can substantially be reduced vvithout compromising the guality and bitrate. The optimum values of the threshold should be calculated depending on the quantization parameter (QP) and the imagery content since the threshold value represents a point in the rate-distortion cost value scale. The rate-distortion and the complexity performance of the multivievv encoder are improved by applying individual threshold values for every block in a Dp frame. I propose a method vvhich automatically calculates the optimum threshold values for blocks during the encoding, where the maximum complexity gain is achieved vvhile maintaining the rate-distortion performance. İn order to calculate the optimum threshold value of a block, the rate-distortion cost value of a previously encoded block from vvhich the disparity vector is derived is utilized. Basically, the B frame has a better coding efRciency than the P frame. When the B frame is employed in a prediction scheme to encode multivievv images, the complexity of the encoder is much higher than the prediction schemes with only P frames. İn this dissertation, the complexity efRcient version of the B frame, called Db frame, is also presented. Different prediction schemes constructed with the Dp and the Db frames are proposed. With the help of the Db frame, the computational load of the multivievv encoder is reduced considerably. Automatic threshold values of the blocks in Db frames are automatically generated during the encoding. The proposed frame types allovv us to encode multivievv images effec-tively with a lovver computational load vvhile keeping the same quality and bitrate. Ali proposed prediction schemes are applied to different multivievv image sets containing various real vvorld objects. For this purpose, ali ideas in this dissertation have been implemented in the JSVM reference softvvare.