Introduction

With the exclusion of the first video coding standards, MPEG has always supported more than rectangular monovision. This chapter explores the evolution of this endeavour over the years.

The early days

MPEG-1 did not have big ambitions (but the outcome was not modest at all ;-). MPEG-2 was ambitious because it included scalability – a technology that reached maturity only some 10 years later – and multiview. As depicted in Figure 18, multiview was possible because, when two close cameras point to the same scene, it is possible to exploit intraframe, interframe and interview redundancy.

Figure 18: Redundancy in multiview video

Both MPEG-2 scalability and multiview saw little take up and both MPEG-4 Visual and AVC had multiview profiles. AVC included Multiview Video Coding (MVC) which was adopted by the Blu-ray Disc Association. The rest of the industry, however, took another turn as depicted in Figure 19.

Figure 19 – Frame packing in AVC and HEVC

If the left and right frames of two video streams are packed in one frame, regular compression can be applied to the packed frame. At the decoder, the frames are decompressed and then de-packetised to yield the two video streams.

This is a practical but less that optimal solution. Unless the frame size of the codec is doubled, the horizontal or the vertical resolution is compromised depending on the frame-packing method used. Because of this, a host of other more sophisticates, but eventually not successful, frame packing methods have been introduced into the AVC and HEVC standards. The relevant information is carried by Supplemental Enhancement Information (SEI) messages, because the specific frame packing method used is not normative as it applies to the “display” part of the process.

The HEVC standard, too, supports 3D vision with tools that efficiently compress depth maps, and exploit the redundancy between video pictures and associated depth maps. Unfortunately use of HEVC for 3D video has also been limited.

MPEG-I

The MPEG-I project – ISO/IEC 23090 Coded representation of immersive media – was launched at a time when the word “immersive” was prominent in many news headings. Figure 20 gives three examples of immersivity where technology challenges increase moving from left to right.

Figure 20 – 3DoF (left), 3DoF+ (centre) and 6DoF (right)

In 3 Degrees of Freedom (3DoF) the user is static, but the head can Yaw, Pitch and Roll. In 3DoF+ the user has the added capability of some head movements in the three directions. In 6 DoF the user can freely walk in a 3D space.

Currently there are several activities in MPEG that aim at developing standards that support some form of immersivity. While they had different starting points, they are likely to converge to one or, at least, a cluster of points (hopefully not to a cloud😊).

OMAF

Omnidirectional Media Application Format (OMAF) is not about compressing but about storing and delivering immersive video. Its main features are:

Support of several projection formats in addition to the equi-rectangular one
Signalling of metadata for rendering of 360ᵒ monoscopic and stereoscopic audio-visual data
Use of MPEG-H video (HEVC) and audio (3D Audio)
Several ways to arrange video pixels to improve compression efficiency
Use of the MP4 File Format to store data
Delivery of OMAF content with MPEG-DASH and MMT.

MPEG has released OMAF v.1 in 2018 and is now working on v.2. The standard is published as ISO/IEC 23090-2.

3DoF+

If the current version of OMAF is applied to a 3DoF+ scenario, the user experience is affected by parallax errors that are more annoying the larger the movement of the head.

To address this problem, MPEG is working on a specification of appropriate metadata (to be included in the red blocks in Figure 21) to help the Post-processor to present the best image based on the viewer’s position if available, or to synthesise a missing one, if not available.

Figure 21: 3DoF+ use scenario

The 3DoF+ standard will be added to OMAF which will be published as 2nd edition. Both standards are planned to be completed in July 2020.

VVC

Versatile Video Coding (VVC) is the latest of MPEG video compression standards supporting 3D vision. Currently VVC does not specifically include full-immersion technologies, as it only supports omnidirectional video as in HEVC. However, VVC could not only replace HEVC in Figure 21, but also be the target of other immersive technologies as will be explained later.

Point Cloud Compression

3D point clouds can be captured with multiple cameras and depth sensors. The points can number a few thousands up to a few billions with attributes such as colour, material properties etc. MPEG is developing two different standards whose choice depends on whether the points are dense (Video-based PCC) or sparse (Geometry-based PCC). The algorithms in both standards are scalable, progressive and support random access to subsets of the point cloud. V-PCC is lossy and G-PCC is currently lossless. See here for an example of a Point Cloud test sequence being used by MPEG for developing the V-PCC standard.

MPEG plans to release Video-based Point Cloud Compression as FDIS in January 2020 and Geometry-based Point Cloud Compression as FDIS in April 2020.

Next to PCC compression, MPEG is working on Carriage of Point Cloud Data with the goal to specify how PCC data can be stored in ISOBMFF and transported with DASH, MMT etc.

6DoF

MPEG is carrying out explorations on technologies that enable 6 degrees of freedom (6DoF). The reference diagram for that work is what looks like a minor extension of the 3DoF+ reference model (see Figure 22). However, it may have huge technology implications.

Figure 22: 6DoF use scenario

To enable a viewer to freely move in a space and enjoy a 3D virtual experience that matches the one in the real world, we still need some metadata as in 3DoF+ but likely also additional video compression technologies that could be plugged into the VVC standard.

Light field

The MPEG Video activity is all about standardising efficient technologies that compress digital representations of sampled electromagnetic fields in the visible range captured by digital cameras. Roughly speaking we have 4 types of camera:

Conventional cameras with a 2D array of sensors receiving the projection of a 3D scene
An array of cameras, possibly supplemented by depth maps
Point clouds cameras
Plenoptic cameras whose sensors capture the intensity of light from a number of directions that the light rays travel to reach the sensor.

Technologically speaking, #4 is an area that has not been shy in promises and is delivering some of them. However, economic sustainability for companies engaged in developing products for the entertainment market has been a challenge.

MPEG is currently engaged in Exploration Experiments (EE) to check

The coding performance of Multiview Video Data (#2) for 3DoF+ and 6DoF, and Lenslet Video Data (#4) for Light Field
The relative coding performance of Multiview coding and Lenslet coding, both for Lenslet Video Data (#4).

However, MPEG is not engaged in checking the relative coding performance of #2 data and #4 data because there are no #2 and #4 test data for the same scene.

Conclusions

In good(?) old times MPEG could develop video coding standards – from MPEG-1 to VVC – by relying on established input video formats. This somehow continues to be true for Point Clouds as well. On the other hand, Light Field is a different matter because the capture technologies are still evolving and the actual format in which the data are provided has an impact on the actual processing that MPEG applies to reduce the bitrate.

MPEG has bravely picked up the gauntlet and its machine is grinding data to provide answers that will eventually lead to one or more visual compression standards to enable rewarding immersive user experiences.

MPEG is holding a “Workshop on standard coding technologies for immersive visual experiences” in Gothenburg (Sweden) on 10 July 2019. The workshop, open to the industry, will be an opportunity for MPEG to meet its client industries, report on its results and discuss industries’ needs for immersive visual experiences standards.

Table of contents

◄

8.2 More video features

█

8.4 Video can be green

►