[ieee 2013 ieee international conference on signal and image processing applications (icsipa) -...

4
Depth Error Concealment Based on Decision Making M. Ranjbari #1 , A. Sali *2 , H. A. Karim #3 ,F. Hashim #4 #1,2,3 Department of Computer and Communication Systems, Engineering, Universiti Putra Malaysia, 43400 UPM Serdang MALAYSIA #4 Faculty of Engineering Multimedia University, Persiaran Multimedia, 63100 Cyberjaya, Selangor Darul Ehsan, MALAYSIA 1 [email protected] 2 [email protected] 3 [email protected] 4 [email protected] Abstract—One of the common form of representing stereo- scopic video is combination of 2D video with its corresponding depth map which is made by a laser camera to illustrate depth in the video. When this type of video is transmitted over error prone channels, the packet loss leads to frame loss; and mostly this frame lost occur in depth frames. Thus, a depth error concealment based on decision making termed as DM-PV, which exploits high correlation of 2-D image and its corresponding depth map. The 2D image provide information about the missing frame in the depth sequence to assist the decision making process in order to conceal the lost frames. The process involves inserting proper blank frame and duplication of previous frames instead of missing frames in depth sequence. PSNR performance improves over frame copy method has no decision making. Furthermore, subjective quality of stereoscopic video is better using DM-PV. I. I NTRODUCTION Nowadays, people show more interest on new media, 3D TVs are one of the popular technologies among people, so de- manding for 3D video broadcasting is growing by technology enthusiasms and it is believed that to be the next generation of home entertainment [1]. Wireless communications play sig- nificant role in broadcasting 3DTV. Despite of ease of access, they introduce many challenges for broadcasting. Wireless communications could be affected by noise and environment interferences, thus it could lead to the signal to take different kind of errors, and most of the broadcasting protocols discard packets with error. This phenomenon makes more packet lost in the wireless systems. 3D video transmission has been known as challenging con- tent for transmission and streaming due to extra information that is related to the depth layer. The method which generally uses to transmit 3D video is depth-image-based rendering (DIBR) [2]. DIBR has two streams, one of them is a normal 2D stream and the second stream is its associated depth map which is created by depth information. Each pixel in the depth map has 8bit value. The value of every pixels in the depth image shows the distance of captured item and the camera, the example illustrated in Figure 1.In such a system, it is needed to transmit monoscopic video along with associated depth map. A high quality video with different view point will be created by synthesizing the two streams at the decoder. Fig. 1. Left picture is colour video and the right one is its depth information. There are many proposed methods to improve the quality of service and mechanisms to cope with the challenges which is for the extra information in 3D video transmission. Most of the methods assign higher importance to the colour frames and less on the depth frames due to high visual importance of colour video. Coding methods and channel assignment techniques are mostly focused to deliver 2D colour frames. In most of the methods it has been preferred to deteriorate depth map quality to reduce redundant information relevant to depth. In [3] for reduction of the extra information of 3D video transmission, the depth map resolution down-samples before encoding but after transmission the depth map up samples to reach the resolution of colour frames. Because of the down sampling and up sampling process, quality of depth frame decreases but it can reduce the transmission cost. This method reveals that it’s very hard for the viewers to distinguish between the original video and the one with the up-sampling and down-sampling process. This paper organized as follows, in section 2 we analyse er- ror concealment for stereoscopic 3D video. Then, we describe the proposed method in section 3. In section 4 we present simulation results of the simulation we evaluate the method. Finally in the section 4 we draw the conclusion. 978-1-4799-0269-9/13/$31.00 ©2013 IEEE 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) 193

Upload: f

Post on 12-Jan-2017

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) - Melaka, Malaysia (2013.10.8-2013.10.10)] 2013 IEEE International Conference on Signal

Depth Error Concealment Based on DecisionMaking

M. Ranjbari #1, A. Sali ∗2, H. A. Karim #3,F. Hashim #4

#1,2,3 Department of Computer and Communication Systems, Engineering,Universiti Putra Malaysia, 43400 UPM Serdang MALAYSIA

#4 Faculty of Engineering Multimedia University, Persiaran Multimedia,63100 Cyberjaya, Selangor Darul Ehsan, MALAYSIA

1 [email protected] [email protected]

3 [email protected] [email protected]

Abstract—One of the common form of representing stereo-scopic video is combination of 2D video with its correspondingdepth map which is made by a laser camera to illustrate depthin the video. When this type of video is transmitted over errorprone channels, the packet loss leads to frame loss; and mostlythis frame lost occur in depth frames. Thus, a depth errorconcealment based on decision making termed as DM-PV, whichexploits high correlation of 2-D image and its correspondingdepth map. The 2D image provide information about the missingframe in the depth sequence to assist the decision making processin order to conceal the lost frames. The process involves insertingproper blank frame and duplication of previous frames instead ofmissing frames in depth sequence. PSNR performance improvesover frame copy method has no decision making. Furthermore,subjective quality of stereoscopic video is better using DM-PV.

I. INTRODUCTION

Nowadays, people show more interest on new media, 3DTVs are one of the popular technologies among people, so de-manding for 3D video broadcasting is growing by technologyenthusiasms and it is believed that to be the next generationof home entertainment [1]. Wireless communications play sig-nificant role in broadcasting 3DTV. Despite of ease of access,they introduce many challenges for broadcasting. Wirelesscommunications could be affected by noise and environmentinterferences, thus it could lead to the signal to take differentkind of errors, and most of the broadcasting protocols discardpackets with error. This phenomenon makes more packet lostin the wireless systems.

3D video transmission has been known as challenging con-tent for transmission and streaming due to extra informationthat is related to the depth layer. The method which generallyuses to transmit 3D video is depth-image-based rendering(DIBR) [2]. DIBR has two streams, one of them is a normal2D stream and the second stream is its associated depth mapwhich is created by depth information. Each pixel in the depthmap has 8bit value. The value of every pixels in the depthimage shows the distance of captured item and the camera, theexample illustrated in Figure 1.In such a system, it is needed totransmit monoscopic video along with associated depth map.

A high quality video with different view point will be createdby synthesizing the two streams at the decoder.

Fig. 1. Left picture is colour video and the right one is its depth information.

There are many proposed methods to improve the qualityof service and mechanisms to cope with the challenges whichis for the extra information in 3D video transmission. Mostof the methods assign higher importance to the colour framesand less on the depth frames due to high visual importanceof colour video. Coding methods and channel assignmenttechniques are mostly focused to deliver 2D colour frames.In most of the methods it has been preferred to deterioratedepth map quality to reduce redundant information relevantto depth. In [3] for reduction of the extra information of 3Dvideo transmission, the depth map resolution down-samplesbefore encoding but after transmission the depth map upsamples to reach the resolution of colour frames. Because ofthe down sampling and up sampling process, quality of depthframe decreases but it can reduce the transmission cost. Thismethod reveals that it’s very hard for the viewers to distinguishbetween the original video and the one with the up-samplingand down-sampling process.

This paper organized as follows, in section 2 we analyse er-ror concealment for stereoscopic 3D video. Then, we describethe proposed method in section 3. In section 4 we presentsimulation results of the simulation we evaluate the method.Finally in the section 4 we draw the conclusion.

978-1-4799-0269-9/13/$31.00 ©2013 IEEE

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)

193

Page 2: [IEEE 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) - Melaka, Malaysia (2013.10.8-2013.10.10)] 2013 IEEE International Conference on Signal

II. ERROR CONCEALMENT FOR STEREOSCOPIC 3D VIDEO

In the low bitrate connections, video packet losses maylead to corruption of the whole frame of a video sequenceduring transmission. Researches are focused mostly on 2Dvideo error concealment, that might not be applicable forstereoscopic video due to ignoring the depth map and itscorrelation with 2D image [4], [5].By considering the issuesduring the transmission we need to have an effective errorconcealment (EC) algorithm [6], [7] to minimize the effect oferroneous transmission channel. Temporal error concealment(TEC) [7] utilize the correlation of successively receivedframes to compensate the missing frames or micro blocks.So it is necessary to propose an efficient method to minimizethe effect of the lost frames. Most of the methods thatproposed for error concealment are for 2D video and fewmethods proposed for 3D video transmission. 3D video errorconcealment methods are mainly focused on motion vector(MV) [8]. At this method, MVs are created by exploiting thecorrelation of 2D video and the depth map.

Many of the proposed methods use interpolation mechanismfrom surrounding images to the lost region of the frame, thesemethods might not have desired result in stereoscopic videodue to lack of depth information. Researches claim that errorperception by human visual system (HSV) is different in 2Dvideo in comparison with 3D video. It has been shown thata small degradation in one view could lead to a noticeableperceptual distortion [9].

Stereoscopic error concealment process is mostly focusedon error concealment of right and left view of the video byusing additional data from correlated image sequence [9], [10]. In [11], the proposed method with the shared Motion Vectors(MVs) exploit the correlation of colour frame and the depthmap to find the motion vectors to predict depth map coding.

III. DECISION MAKING BASED ON PIXEL VALUE(DM-PV)

The proposed method, which is called Decision Makingbased on Pixel Value(DM-PV) aimed to conceal lost framesat the decoder, based on the correlation of 2D images whichis received correctly and corresponding depth map. When lossoccur during transmission in the depth frames, we refer to thecorresponding colour image and compare it with the previouscolour image in the sequence then depends on the difference,decision could be made. There are three main steps:

1) Find the index number of lost depth frame and then findthe corresponding colour frame

2) Compare the corresponding colour frame with the pre-vious frame in the colour sequence

3) make the decision based on the comparison which ismade in step 2.

A. Finding the lost frame and its corresponding colour frame

It is inevitable to lose some frames during transmission,and the higher probability of losing frames goes to depthframes due to less priority of them. Most of the methodsassigned more protection for delivery colour frame due to

higher visual importance. In H264/SVC they put depth framesas enhancement layers [12]. Thus, in such methods, it is moreexpected to lose frames in the depth sequence. In the receiverwhen a frame lost occur in the depth sequence we look forthe corresponding frame in the colour sequence and the colourframe will be used for the comparison in the next step. Thefinding process is illustrated in Fig. 2.

Fig. 2. Red frame is the lost frame and we choose the same frame in thecolour sequence for the comparison in the next step.

B. Comparing the corresponding colour frame with the per-vious frame in the sequence

As long as colour frames and their corresponding depthmaps are so correlated, we can estimate the lost frame sim-ilarity with its previous frame in the sequence. Firstly, theconsecutive frames are converted to gray-scale images becausethe colour component is not important in terms of comparisonfor this purpose. After that we calculate absolute differenceof two images and then take the average pixel values of theimages to calculate difference percentage. The average pixelvalue between two consecutive frames can be estimated as

DPG =ADG× 100

AS(1)

where DPG represents difference percentage in grayscale,ADG is the average of absolute difference of two grayscaleimages and AS denotes average pixel value of the secondconsecutive image.

C. Making the decision based on the comparison

In a video sequences, there will not be so much differencesbetween two consecutive frames, thus if the depth layer islost during the transmission we can make a decision based oncomparison between the same frame on the colour sequence.To optimise the result, one threshold value should be chosento make the decision based on it.It should be noted that thethreshold value is evaluated by subjective quality of synthe-sised video. The results show us that the best performance ofthe method is for the higher thresholds. It means that it is

978-1-4799-0269-9/13/$31.00 ©2013 IEEE

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)

194

Page 3: [IEEE 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) - Melaka, Malaysia (2013.10.8-2013.10.10)] 2013 IEEE International Conference on Signal

Fig. 3. Second frame in the depth sequence has been duplicated.

Fig. 4. The blank frame is inserted instead of missing frame.

Fig. 5. Left frame is the left view for frame copy and the right frame is the left view for DM-PV, the circles highlight noticeable difference between Framecopy and DM-PV.

desired to have a threshold value at high DPG. There are twopossible ways after the decision making,

1) Sometimes in a video sequence, the difference betweentwo consecutive frames are not noticeable. In such cases,if we copy the previous frame in the sequence insteadof lost frame we could have acceptable quality. In the

proposed method if the discrepancy is less than thethreshold value between two consecutive frames, weduplicate the previous frame instead of leaving the lostframe blank. The example is illustrated in Fig. 3.

2) If the difference of two pictures is more than thresholdwe create and put one blank frame to replace the lost

978-1-4799-0269-9/13/$31.00 ©2013 IEEE

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)

195

Page 4: [IEEE 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) - Melaka, Malaysia (2013.10.8-2013.10.10)] 2013 IEEE International Conference on Signal

TABLE ILEFT AND RIGHT PSNR VALUES (IN DB) FOR 10 PERCENT FRAME LOST

Method Sequence PSNR Left PSNR Right

DM-PV Break Dance 94.19 94.20

Ballet Dance 92.51 92.52

Frame Copy Break Dance 93.14 93.15

Ballet Dance 91.97 91.99

frame. The blank frame is made by considering theaverage colour from neighbouring frames in the depthsequence. The process is demonstrated in Fig. 4.

TABLE IILEFT AND RIGHT PSNR VALUES (IN DB) FOR 20 PERCENT FRAME LOST

Method Sequence PSNR Left PSNR Right

DM-PV Break Dance 87.43 87.44

Ballet Dance 88.62 88.63

Frame Copy Break Dance 86.63 86.63

Ballet Dance 88.02 88.03

IV. RESULT AND DISCUSSIONTwo different methods have been used for the experiment,

one of them is using frame copy with no decision makingalgorithm and the other one is DM-PV. In this experiment,we used two 3D video sequences namely ’Break Dance’ and’Ballet Dance’. Original sequence is used and no compressionand video codec is applied. The uncompressed video is usedin the experiment to get more precise results.

Break Dance is a sequence with lots of objects and move-ments; but in contrast Ballet Dance has less objects andmovements. We used 10 percent random loss in the depthsequence. For concealing the lost frames in the sequences themethods has been examined with frame lost rate, simulationresults are shown in the Table I and Table II.

The results show that the DM-PV method obtain higherPSNR value for both left and right view. In addition itdemonstrates that the proposed method has better performancein the sequences with lots of motion and changing of objectsin the sequence. In other words, when the two consecutiveframes are dramatically different DM-PV method can be anefficient method to improve PSNR. But in very low motionsequences, the methods could have exactly the same resultsbecause the two consecutive frames cannot meet the thresholdvalue. In this scenario only method to conceal errors is framecopy.

The subjective quality is improved specially in the highmotion sequences. Frame copy with no decision making couldlead high distortion in one or both of the views. As it isillustrated in Fig. 5 it can be seen that the middle object withhigh motion observed considerable distortion. DM-PV couldimprove the distortion in the high motion sequences.

V. CONCLUSION AND FUTURE EXTENSIONS

It is highly expected to lose packet during transmissionespecially when it comes to error prone channels like wirelesschannels and it leads to frame loss in video transmission.This paper is focused on error concealment in the receiver byapplying DM-PV. The lost frames in the depth sequence couldbe concealed by applying the proposed method. It is shownthat DM-PV improves the quality of stereoscopic video interms of better PSNR in comparison with frame copy method.Furthermore subjective quality of stereoscopic video improvesin high motion sequences in compare with frame copy.

At present the pixel values are used to compare two consec-utive frames in the colour sequence. In the future work moreprecise methods to compare the frames will be exploited. Thecurrent blank frame will be improved to get the better PSNRvalue as well as subjective quality in compare with DM-PV.

REFERENCES

[1] I. Ahmad, “Multi-View Video: Get Ready for Next-Generation Televi-sion,” Distributed Systems Online, IEEE, vol. 8, no. 3, p. 6, 2007.

[2] C. Fehn, “Depth-image-based rendering (DIBR), compression,and transmission for a new approach on 3D-TV,” ElectronicImaging 2004, pp. 93–104, 2004. [Online]. Available: +http://dx.doi.org/10.1117/12.524762

[3] N. Shah, H. A. Karim, and M. F. Ahmad Fauzi, “Further Reduced Res-olution Depth Coding for stereoscopic 3D video,” in Signal and ImageProcessing Applications (ICSIPA), 2011 IEEE International Conferenceon, 2011, pp. 94–99.

[4] Y. Xu and Y. Zhou, “H.264 video communication based refined errorconcealment schemes,” Consumer Electronics, IEEE Transactions on,vol. 50, no. 4, pp. 1135–1141, 2004.

[5] Chen.Yu, Yu. Keman, Li.Jiang, Li.Shipeng, “An error concealmentalgorithm for entire frame loss in video transmission,” in Picture CodingSymposium, 2004, pp. 15—-17.

[6] W.-M. Lam, A. R. Reibman, and B. Liu, “Recovery of lost or er-roneously received motion vectors,” Acoustics, Speech, and SignalProcessing, 1993. ICASSP-93., 1993 IEEE International Conference on,vol. 5, pp. 417–420 vol.5, 1993.

[7] D. Agrafiotis, D. R. Bull, and C. N. Canagarajah, “Enhanced errorconcealment with mode selection,” Circuits and Systems for VideoTechnology, IEEE Transactions on, vol. 16, no. 8, pp. 960–973, 2006.

[8] B. Yan and J. Zhou, “Efficient Frame Concealment for Depth Image-Based 3-D Video Transmission,” Multimedia, IEEE Transactions on,vol. 14, no. 3, pp. 936–941, 2012.

[9] S. Knorr, C. Clemens, M. Kunter, and T. Sikora, “Robust concealmentfor erroneous block bursts in stereoscopic images,” in 3D Data Process-ing, Visualization and Transmission, 2004. 3DPVT 2004. Proceedings.2nd International Symposium on, 2004, pp. 820–827.

[10] X. Xiang, D. Zhao, Q. Wang, X. Ji, and W. Gao, “A Novel Error Con-cealment Method for Stereoscopic Video Coding,” in Image Processing,2007. ICIP 2007. IEEE International Conference on, vol. 5, 2007, pp.V – 101–V – 104.

[11] C. T. E. R. Hewage, S. Worrall, S. Dogan, and A. M. Kondoz, “Frameconcealment algorithm for stereoscopic video using motion vectorsharing,” in Multimedia and Expo, 2008 IEEE International Conferenceon, 2008, pp. 485–488.

[12] C.T.E.R.Hewage, H.A.Karim, S.Worrall, S.Dogan, A.M.Kondoz, “Com-parison of stereo video coding support in MPEG-4 MAC, H. 264/AVCand H. 264/SVC,” in Proc. of IET Visual Information Engineering-VIE07, 2007.

978-1-4799-0269-9/13/$31.00 ©2013 IEEE

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)

196