Multicast Transmission of Scalable Video using Receiver-driven Hierarchical FEC

Wai-tian Tan and Avideh Zakhor
Department of Electrical Engineering and Computer Sciences
University of California, Berkeley, CA 94720, USA
email: {dtan avz}@eecs.berkeley.edu
 

1. Introduction

The deployment of IP multicast in the MBONE has provided a convenient and efficient way of disseminating video information to multiple recipients over the Internet. However, since the recipients may have different delay tolerances and are typically connected to the source via paths of different delay, bandwidth and loss characteristics, traditional approaches to flow and error controls based on source adaptations may not be effective. A more elegant alternative is the receiver-driven approach [1] which allows receivers to decide on the communication parameters. For instance, an effective solution to the bandwidth heterogeneity of today's high-speed networks is to use scalable video compression [2,18,19,20,22] with layered multicast [1]. Under layered multicast, scalable video is partitioned into different layers which are carried using different multicast groups. Each receiver can then individually subscribe or unsubscribe to those multicast groups to effect flow control. Other schemes have also been proposed that require additional support from the network, such as explicit congestion notification and hop-by-hop flow control, to further enhance the overall bandwidth utilization by dynamically adjusting the number of video layers and the rate of each layer in a layered multicast setting [3].

On the error control side, most work has focused on providing end-to-end reliability based on retransmissions [4, 5, 6] or hybrid schemes employing forward error correction (FEC) and retransmissions [7,8]. Because the cost of retransmissions in a multicast setting is high, such schemes are often justified only for low bandwidth applications that require reliability, such as the shared white-board in the MBONE tools. Retransmissions may also be inapplicable for real-time applications, especially when round trip time is long. In contrast, pure FEC based schemes are simpler, but can be wasteful as redundancy packets are transmitted to all receivers regardless of whether they are needed or not.

In this work, we will investigate the application of hierarchical FEC as an error control method for scalable video multicasting. By hierarchical FEC, we mean the production of embedded FEC or redundancy streams, each of which belongs to a different multicast group. In such a way, subscribing to more groups corresponds to higher level of protection. There are two advantages to using hierarchical FEC. First, each receiver can independently adjust the desired level of protection based on past reception statistics and the application's delay tolerance. This receiver-driven approach to FEC is more flexible than most existing FEC schemes for multicast where the source determines a set of redundancy packets which are then multicast to every recipient [8]. Second, each receiver will subscribe to only as many redundancy layers as necessary, reducing overall bandwidth utilization. Thus, the hierarchical nature of the FEC layers ensures minimum network utilization through sharing of common streams. An effective means to combat burstiness in packet losses is suggested in [9] where redundancy packets are transmitted at a later time than the actual data packets. Following [9], we delay the transmission of FEC layers to relieve the effects of bursty losses. The additional latency associated with the FEC layers creates a trade-off between higher levels of protection and increased latency, and also as a disincentive for receivers to unnecessarily use FEC.

The use of scalable video has three implications. First, it is possible to achieve flow control by adjusting the number of data layers based on the observed channel characteristics. Second, it is possible to drop data layers to make room for FEC layers so as not to increase overall transmission rate. Third, it is possible to strategically provide FEC only to the most important data layer so as to reduce the amount of FEC traffic. The optimal division of the available bandwidth between data and FEC depends on the error resilience of the scalable video source. Since less error-resilient compression schemes suffer higher penalty when packets are lost, they tend to allocate larger portions of the available bandwidth for FEC. This generally causes the quality of the received video to be lower. As a result, it is desirable to have error resilience even when FEC is used for error control. In this paper, we will evaluate our framework using a scalable and error resilient compression scheme that we have developed for Internet transmissions [18,19,20].

The remaining of the paper is organised as follows. An overview of our proposed system architecture is described in Section 2. The selection of error control code for implementing the hierarchical FEC is described in Section 3. We will describe our scalable video compression scheme and its application to the proposed hierarchical FEC framework in Section 4 and present experimental results in Section 5. Conclusion and future work are in Section 6.

2. Proposed System Architecture

In a multicast environment, besides bandwidth heterogeneity, different users may experience different packet loss characteristics and may have different degrees of tolerance to latency. For example, in live multicast of a lecture, participants who want to ask questions and interact with the lecturer desire stringent real-time constraints on the video while passive listeners may be willing to sacrifice latency for higher video quality. Because of the high bandwidth required for video traffic, generating a separate stream for every possible set of user requirements and channel loss rate is infeasible. Instead, we wish to cater to different users with varying latency requirements by providing a framework in which each user can individually trade-off latency for quality.

2.1 Video Sources

In our proposed scheme, a scalable video source not only sends video packets to multiple multicast groups as in layered multicast, it also constructs hierarchical FEC layers for the video layers.  Fig.1 illustrates our proposed scheme using 3 layers of video and 2 layers of hierarchical FEC:
 
Fig.1: Timing Diagram of Hierarchical FEC Scheme
 
where GOP-x-y denotes the y-th data layer corresponding to the x-th GOP.  Similarly, FEC-x-y denotes the y-th FEC layer generated for the x-th GOP.

Addresses D1, D2, D3 are used to carry layered video data and addresses R1, R2 are used to carry redundancy packets. All layers of video data corresponding to the same group of pictures (GOP) that are compressed together are transmitted in the same time slot. As a result, users with stringent delay requirements can simply subscribe to D1, and possibly D2 and D3 depending on the bandwidth that is available to them. Users with higher tolerance of latency who are also experiencing packet losses can subscribe to addresses R1 and potentially R2 to obtain redundancy to repair lost packets at the cost of extra delay. The reason for spreading the redundancy layers for the same GOP across different time slots is to achieve interleaving, which is an effective means of combating bursty losses often experienced in the Internet [9]. Fig. 2 summarizes the behavior for different users. We see that FEC packets are only transmitted to branches that lead to subscribers requiring FEC, thereby reducing overall network load.

In order to determine the optimum distribution of FEC packets across video layers, one can setup and solve an optimization problem [18].  Instead, in the experiments reported in this paper, we only apply FEC to the first layer of Video rather than all layers. The approach is motivated by the fact that (1) the first layer video is generally much more important than other layers, and (2) it reduces the computation each receiver needs to decide the number of FEC layers to subscribe to.  The packets in the FEC layers are generated by applying the FEC code described in Section 3 on the packet level.  The sizes of the FEC layers determine the granularity of available protection.  Generally, providing many small FEC layers improves the granularity of available protection levels but incurs a higher networking overhead associated with maintaining the larger number of multicast groups.
 

Fig.2: Receiver-driven Hierarchical FEC schemes with multiple Receivers

2.2 Video Receivers

A central problem in layered multicast is determining the rate control algorithm at the receivers. Receiver-driven layered multicast (RLM) of [1] is an early scheme that drops a layer when congestion is detected and "probes" for more bandwidth when losses are absent. Since the act of probing for bandwidth may cause congestion which leads other receivers to mistakenly reduce their subscription levels, the difficult tasks of coordinating probes and communicating their results are necessary [1,12]. An alternative approach that does not involve probing, but instead calculates the bandwidth directly using measured quantities at the receiver is described in [10,12]. The scheme is based on the TCP-friendly equation [11] that estimates the average throughput T of a TCP connection under packet loss rate p and round trip time rtt: where MTU, the maximum transport unit, is a constant that is often taken to be 500 bytes.

Because the problem of rate control in a layered multicast setting is still an area of active research and that our proposed hierarchical FEC scheme does not depend on any specific rate control scheme, we adopt a simple rate control algorithm which is sufficient for the scale of experiments we are performing. Specifically, our rate control scheme is similar to [10] in that every receiver estimates the available bandwidth according to Equation 1 in an uncoordinated fashion.  To avoid unnecessary reaction to network transients, the bandwidth calculation is performed relatively infrequently, with a period of 50 seconds.  The packet loss rate p is measured over the whole period, and over all data and FEC layers while rtt is updated only once every 5 seconds to reduce the load on the source. Clearly, it is possible to augment our rate control approach by adding synchonization mechanisms as in [12].

Given a dynamic bandwidth constraint obtained by periodically applying Equation 1, the receiver then decides on the optimal number of data versus FEC layers to subscribe. The partition of bandwidth between data and FEC depends on the characteristics of the scalable video compression method used and is described in Section 4.2.

3. Selection of Error Control Code for Hierarchical FEC

An effective FEC scheme will allow recovery of lost data packets with as little redundancy packets as possible. This is achieved when the loss of any m data packets are recoverable from any m parity packets. Such properties are often referred to as maximal distance separable (MDS) in algebraic coding theory [14]. An example of MDS code is the Reed-Solomon code. Fig. 3 shows the use of an (n, k) MDS code which takes k data packets and produces n-k parity packets. The basic idea behind hierarchical FEC is to partition the n-k parity packets into FEC layers of potentially different sizes. A source may decide to transmit the data packets with any number of FEC layers, and a receiver will be able to recover all the data packets from any combinations of k data or parity packets. For example, if we form the first FEC layer by taking the first f1 packets from the n-k parity packets, since any m lost data packets can be corrected using any m of the n-k parity packets, they can also be corrected by any m packets in the first FEC layer as long as m does not exceed f1. Similarly, the second FEC layer is formed by the next f2 parity packets and any combination of m parity packets from the first two FEC layers can correct any m lost data packets as long as m does not exceed f1+f2.
Fig. 3: Generation of FEC layers by puncturing MDS codes
 

The successive refinement property of Reed-Solomon codes has long been exploited in data communications to provide incremental redundancy. For example, in the scheme of [15], a sender will continually transmit parity packets until a receiver has enough parity packet to recover all lost data packets. More recently, the dual code of Reed-Solomon code, which is also MDS, has been used to reduce the average transmission time of data in a large reliable data multicast setting [16].

Following [16], we select the dual code of the Reed-Solomon code as the MDS code for generating FEC packets. The dual code has no theoretical advantage over Reed-Solomon code and is chosen primarily because of the availability of existing software implementations [17].

4. Selection of Scalable Video Compression

In this section, we will describe our error resilient scalable video codec [18,20] for actual experiments on the MBONE. The codec is chosen because The ability to limit the effects of packet loss is important since even with our proposed architecture in Section 2, packet losses may still be present due to traffic burstiness or because real-time constraints have precluded the use of FEC layers. Furthermore, a more error-resilient scheme requires less FEC at the same loss rate. Under our proposed video compression scheme, when a packet is lost, errors do not propagate to other packets in the same layer and only propagate to one packet in the upper layer. As is shown in Fig. 4, this is in contrast to most scalable video compression schemes which assume transport prioritization and produce packets that are highly dependent.
 
Fig.4: Data Dependency of Proposed Scalable Video

4.1 Achieving Rate Scalability and Resilience

The proposed compression scheme is based on 3-D subband coding followed by hierarchical block coding [19]. After subband analysis, we perform data partitioning as follows. Each of the M subbands is divided into an equal number of coefficient blocks, say P. We then form P components from the coefficient blocks by allocating one coefficient block from each subband to each component.  The coefficient blocks of a particular component are chosen in such a way as to correspond to a different spatio-temporal region .  This is done to ensure that a loss of a component does not result in complete loss of information about any one particular spatio-temporal region in the video. Fig. 5 illustrates the partitioning in the 2-D case where seven subbands are obtained, each of which is divided into 9 blocks.  The colored blocks represent one possible way to form a component.
 
Fig. 5: Data Partitioning of Subbands

Each component is then compressed independently as follows. Each coefficient block is split into bit-planes. Each bit-plane is compressed using hierarchical block coding of [19] to produce a small codeword which provides fine granularity for layered packetization. Packetization is carried out so that the bit-plane offering the best rate-distortion trade-off is packed first. The scheme is illustrated in Fig. 6.
 

Fig. 6: Compression into scalable video with restricted data dependence
 
The first layer packets are obtained by truncating the compressed bit-planes at rate R1 and the second layer is formed by the portion between rates R2 and R1, etc. In such a way, the loss of a packet in any layer will only affect packets in higher layers that correspond to the same component, as shown in Fig. 4.

4.2 Partition of bandwidth between Data and FEC

In this section, we consider the problem of dividing the available bandwidth between data and FEC layers. Our objective is to minimize the expected distortion given an estimate of the packet loss rate and a priori knowledge of the rate-distortion characteristics of the video material.

Assume constant packet sizes. When d layers of data are subscribed by a receiver, the expected distortion is given by:

where Di is the distortion associated with decoding i data layers and pi is the probability that we decode at distortion level Di,.  Assuming a total of f  FEC layers are subscribed and that each data layer and FEC layer contains Nd and Nf packets respectively, the bandwidth constraint of B packets results in: The quantities Di are given by the rate-distortion characteristics of video source and do not depend on the observed packet loss rate p. The Di can be explicitly computed at the source and then communicated to the receivers.  Alternatively, one can classify video sources according to their characteristics and then generate representative values for Di for every class. In that case, the source can be relieved from the task of having to compute the distortions associated with every possible rate. The probabilities pi depend on p and the data dependencies between packets in the scalable bit-stream.  For data dependencies given by Fig. 6, the probability that a first layer packet is lost and cannot be recovered by FEC is given by: The other pi are given by: Formally, the optimation problem is to find the combination of f and d that satisfies the bandwidth constraint given by Equation 3 and minimizes the expected distortion given by Equation 2. Instead of testing all possible f and d that satisfy the bandwidth constraint and search for the combination that yields minimum expected distortion, the problem can be simplified by observing that the expected distortion is monotonically decreasing with f and d.  As a result, it suffices to test all possible d from 1 to B/Nd  while keeping f as large as possible without exceeding the bandwidth constraint. Each receiver then performs the minimization by performing a linear search of d from 1 to B/Nd  . The optimal number of data layers is then computed as: The optimality is only within the specified framework.  First, a more general approach of providing FEC to all layers of data instead of only the first layer can yield lower distortion given the same total bandwidth for data and FEC layers, but increases the complexity of the receivers. When only the first layer is protected, the bandwidth partitioning problem is analogous to partitioning B packets to 2 bins, one for data and one for first layer FEC. This involves only an one dimensional search.  When N data layers are protected by FEC, we have to consider all possible partitioning of B packets into N bins, one for data and one for FEC for each of the protected data layers. This involves an N dimensional search in which the dependency between layers needs to be taken into account.  Second, the granularity of the available levels of protection is dictated by the sizes of the FEC layers. Since a receiver cannot subscribe to a fraction of a layer, the optimization is only within all possible levels of protection provided.

4.3 Compression Performance and Speed

The compression performance of the proposed codec is found to be comparable to MPEG-1 implementation reported in [21] at the bit rate range of 500 kbps to 3 Mbps for a talking head sequence and a high motion sequence [18].  It also permits real-time encoding and decoding on current workstations. Specifically, in the bit rate range of 500 kbps to 3 Mbps, encoding and decoding both proceed at the range of 18 fps to 12 fps.

5. Experimental Results

In this section, we will describe actual MBONE experiments using the proposed scheme. Multicast experiments are performed using UC Berkeley as the source and two machines at Georgia Tech and ISI in Los Angeles as receivers. The paths from Berkeley to ISI and Georgia Tech traverses 12 and 13 links respectively with average round trip times of 28 and 80 ms respectively. There is a total of 7 common links for the two paths. The video source is the Raider sequence at 12 fps and size 320x224. A total of 4 data layers each of 100 kbps and 4 FEC layers each of 50 kbps are generated. In Fig. 1, every GOP spans 4 video frames or 1/3 second, and consists of 14 packets of size 320 bytes per data layer, and 7 packets of size 320 bytes per FEC layer. Therefore, an additional latency of 4/3 seconds is introduced when a user decides to subscribes to all FEC layers.

We perform two experiments. In the first experiment, both receivers have stringent real-time requirements so that no FEC layers will be subscribed to. In the second experiment, which is run immediately after the first one, receivers are non-interactive and when loss rates are non-trivial, they may sacrifice some data layers to yield bandwidth to FEC layers.

In our experiments, the link from Berkeley to Georgia Tech is never congested and no packet losses are observed. As a result, the receiver at Georgia Tech is subscribing at the maximum number of 4 data layers and without any FEC layers in both experiments. The link from Berkeley to ISI however, has an average packet loss rate of over 18% and given a rtt of about 28 ms, Equation 1 results in an average throughput of 400 kbps. In the first experiment, all of the 400 kbps is allocated to data whereas in the second experiment, 200 kbps is allocated to data and the other 200 kbps to FEC. The partition of bandwidth between FEC and data is determined by the measured packet loss rate in such a way that the expected distortion is minimized, as described in Section 4.2.  The rate-distortion characteristics of the video is precomputed at the source and released to the receivers before the experiment.

Fig. 7 shows the packet loss rates experienced at the different data layers in the two experiments for the connection from Berkeley to ISI. It is observed that the actual observed packet loss rates do not change significantly between the two experiments and that the packet loss rate across layers is more or less constant.

Fig. 7: Observed packet loss rate at different data layers (Berkeley to ISI)

Fig. 8 shows the the fraction of packets received per GOP for experiment 2 for the Berkeley to ISI connection. We observe that when measured over intervals of 1/3 second, packet losses show much burstiness, which necessitates the delay in sending the FEC layers.

Fig. 8: Packet reception rate for experiment 2 (Berkeley to ISI)
 
Each experiment is run for a duration of 300 GOP or approximately 100 seconds. For experiment 2, there are 130 GOPs out of 300 in which at least one first layer packet is lost. Of the 130 GOPs, 116 are corrected by the use of FEC. The decoded video trace for experiments 1 and 2 are shown in Fig. 9.
Fig. 9: MSE trace for video in experiments 1 (top) and 2 (bottom) from Berkeley to ISI
 
The average distortion for experiments 1 and 2 are 333 and 85 respectively. Even though both experiments consume the same bandwidth, the use of FEC for the first layer enhances the reception quality of experiment 2 at the expense of an extra 4/3 second of latency.

6. Conclusions and Future Work

We have described in this paper a scheme using scalable video with receiver-driven hierarchical FEC so that each receiver can individually trade-off latency for better reception quality. The scheme is efficient in that FEC is provided only for the most important layer of the video data, and that there is no loss in error correcting capability by imposing a hierarchical FEC structure. We propose the use of an error resilient scalable video compression scheme to alleviate the impact of packet losses that cannot be corrected by FEC and we conducted experiments over the MBONE which demonstrates the potential benefits of scheme.

There are several issues that need to be addressed further. First, the current approach to determine the number of FEC layers are based on minimization of expected distortion assuming independent packet losses. More meaningful metrics that yield visually pleasing rather than minimum mean squared error video are desired. Second, the current approach to flow control is completely decentralized. Even though the receivers do not actively probe for bandwidth, the adding and dropping of layers are uncoordinated. It is interesting to investigate whether synchronization techniques as proposed in [12,13] are necessary for the network to achieve stability. Third, the current transport protocol is plain UDP. The additional incorporation of RTP would provide a standardized framework for the measurements of quantities such as round-trip time and delay jitter.

7. Acknowledgements

The authors would like to thank Dr. Anindo Banerjea of ISI, Prof Edward Delp of Purdue University and Prof Ammar Mostafa of Georgia Tech for providing remote accounts for MBONE experiments.

8. References

[1] S.McCanne, V.Jacobson and M.Vetterli. Receiver-driven Layered Multicast. Proc. Sigcomm, pp 117-130, October 1996.

[2] N.Shacham. Multipoint Communication by Hierarchically Encoded Data Proc. Infocomm, pp 2107-2114, May 1992.

[3] B.Vickers, C.Albuquerque and T.Suda. Adaptive Multicast of Multi-layered Video: Rate-based and Credit-based Approaches. Proc. Infocomm, pp 1073-1083, March 1998.

[4] S.Floyd, V.Jacobson, C.Liu, S.McCanne and L.Zhang. A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing. IEEE Trans. Networking, Vol. 5, No. 6, pp 784-803, December 1997.

[5] X.Xu, A.Myers, H.Zhang and R.Yavatkar. Resilient Multicast Support for Continuous Media Applications. Proc. NOSSDAV '97, pp 183-194, May 1997.

[6] X.Li, S.Paul, P.Pancha and M.Ammar, Layered Video Multicast with Retransmissions (LVMR): Evaluation of Error Recovery Schemes. Proc. NOSSDAV '97. pp 161-172, May 1997.

[7] S.Pejhan and M.Schwartz. Error Control using Retransmission Schemes in Multicast Transport Protocols for Real-time Media, IEEE Trans. Networking, Vol. 4, No. 3, pp 413-427, June 1996.

[8] J.Nonnenmacher, E.Biersack and D.Towsley. Parity-based Loss Recovery for Reliable Multicast Transmission. IEEE/ACM Trans. Networking, Vol. 6, No. 4, pp 349-361, August 1998.

[9] J.Bolot and A.Vega-Garcia. The case for FEC based Error Control for Packet Audio in the Internet., ACM Multimedia Sys.1997.

[10] T.Turletti, S.Parisis and J.Bolot. Experiments with a Layered Transmission Scheme over the Internet. INRIA Technical Report. http://www.inria.fr/RRRT/RR-3296.html

[11] M.Mathis, J.Semke, J.Mahdavi and T.Ott. The Macroscopic Behavior of the TCP Congestive Avoidance Algorithm. CCR, Vol. 27, No. 3, July 1997.

[12] L.Vicisano, L.Rizzo and J.Crowcroft. TCP-like Congestion Control for Layered Multicast Data Transfer. Proc. InfoComm 98, Vol. 3, pp 996-1003, March 1998.

[13] X.Li, S.Paul and M.Ammar. Layered Video Multicast with Retransmissions (LVMR): Evaluation of Hierarchical Rate Control. Proc. Infocomm '98. pp 1062-1072, March 1998.

[14] S.Wicker. Error Control Systems for Digital Communication and Storage. Prentice-Hall, 1995.

[15] D.Mandelbaum. An Adaptive-Feedback Coding Scheme using Incremental Redundancy. IEEE Trans. Info. Theory, Vol. IT-20, No. 3, pp 388-389, May 1974.

[16] L.Rizzo and V.Visano. A Reliable Multicast Data Distribution Protocol based on Software FEC Techniques. Proc. 4th IEEE Workshop Arch. and Implementation of High Perf. Comm. Sys. (HPCS 97), 1997.

[17] L.Rizzo. Effective Erasure Codes for Reliable Computer Communication Protocols. CCR, Vol. 27, No. 2, pp 24-36, April 1997.

[18] W.Tan and A.Zakhor. Real-time Internet Video using Error Resilient Scalable Compression and TCP-friendly Transport Protocol. To appear in IEEE Trans. Multimedia.

[19] W.Tan, E.Chang and A.Zakhor. Real Time Software Implementation of Scalable Video Codec. Proc. ICIP, Vol. 1, pp 17-20, September 1996.

[20] W.Tan and A.Zakhor. Internet Video using Error-resilient Scalable Video Compression and Cooperative Transport Protocol. Proc. ICIP 98, Vol. 3, pp 458-462, October 1998.

[21] K.Patel, B.Smith and L.Rowe. Performance of a Software MPEG Video Decoder. Proc. of ACM Multimedia '93, pp 75-82, 1993.
 
[22] D.Taubman and A.Zakhor. Multirate 3-D Subband Coding of Video. IEEE Trans. Image Proc., Vol. 3, No. 5, pp 572-588, September 1994.