12.1 – Meaningful data can be compressed

  • Post author:
  • Post category:Mpeg book

Introduction

Obviously video is a high-profile topic to MPEG people – MP stands for Moving Pictures (not Motion Pictures, which is another story) and Audio is another high-profile topic. This should not be a surprise given that the official MPEG title is “Coding of Moving Pictures and Audio”.

What is less known, but potentially very important, is the fact that MPEG has already developed a few standards for compression of a wide range of other data types. Point Cloud is the data type that is acquiring a higher profile by the day, but there are many more types, as represented by the table below where standards are represented by the MPEG acronyms and the numbers in the respective columns are the part numbers.

Table 11 – Data types and relevant MPEG standards

Video

The chapters Forty years of video coding, More video features and Immersive visual experiences provide a detailed history of video compression in MPEG from two different perspectives. Here the video-coding related standards produced or being produced are listed.

Table  12 – MPEG Video-related standards

Standards Part Description
MPEG-1 Part2 Widely used standard
MPEG-2 Part2 Widely used standard
MPEG-4 Part2 Visual used for video on internet and for movie compression
Part9 Reference Hardware Description, a standard that supports a reference hardware description of the standard expressed in VHDL (VLSI Hardware Description Language), a hardware description language used in electronic design automation.
Part10 Advanced Video Coding is the ubiquitous video coding standard
Part29 Web Video Coding was planned to be Option 1, not achieved
Part31 Video Coding for Browsers, was planned to be Option 1, discontinued
Part33 Internet Video coding, was planned to be Option 1, not achieved because there are 3 Option 2 patent declarations. When affected technologies will be disclosed MPEG will remove them from the standard
MPEG-5 Part1 Essential Video Coding will have a base layer/profile which is expected to be Option 1 and a second layer/profile with a performance ~25% better than HEVC. Licensing terms are expected to be published by patent holders within 2 years
Part2 Low Complexity Enhancement Video Coding (LCEVC) will be a two-layer video coding standard. The lower layer is not tied to any specific technology and can be any video codec; the higher layer is used to extend the capability of an existing video codec
MPEG-7 is about Multimedia Content Description. There are different tools to describe visual information
Part3 Visual is a form of compression as it provides tools to describe Color, Texture, Shape, Motion, Localisation, Face Identity, Image signature and Video signature
Part13 Compact Descriptors for Visual Search can be used to compute compressed visual descriptors of an image. An application is to get further information about an image captured e.g. with a mobile phone
Part15 Compact Descriptors for Video Analysis allows to manage and organise large scale data bases of video content, e.g. to find content containing a specific object instance or location
MPEG-C is a collection of video technology standard that do not fit with other standards
Part4 Media Tool Library is a collection of video coding tools (called Functional Units) that can be assembled using the technology standardised in MPEG-B Part 4 Codec Configuration Representation
MPEG-H Part2 High Efficiency Video Coding is the latest MPEG video coding standard with an improved compression of 60% compared to AVC
MPEG-I is the new standard, mostly under development, for immersive technologies
Part 3 Versatile Video Coding is the ongoing project to develop a video compression standard with an expected 50% more compression than HEVC
Part7 MPEG-I part 7 Immersive Media Metadata is the current project to develop a standard for compressed Omnidirectional Video that allows limited translational movements of the head.
Explorations Expl1 6 Degrees of Freedom (6DoF)
Expl2 Light field

Audio

The chapter Audio compression in MPEG provides a detailed history of audio compression in MPEG. Here the audio-coding related standards produced or being produced are listed.

Table 13 – MPEG Audio-related standards

Standards Part Description
MPEG-1 Part3 Audio produced, among others, the foundational digital audio standard better known as MP3
MPEG-2 Part3 Audio extended the stereo user experience of MPEG-1 to Multichannel
Part7 Advanced Audio Coding is the foundational standard on which MPEG-4 AAC is based
MPEG-4 Part3 Advanced Audio Coding (AAC) currently used by some 10 billion devices and software applications growing by half a billion unit every year
MPEG-7 is about Multimedia Content Description.
Part4 There are different tools to describe audio information
MPEG-D is a collection of different audio technologies
Part1 MPEG Surround provides an efficient bridge between stereo and multi-channel presentations in low-bitrate applications as it can transmit 5.1 channel audio within the same 48 kbit/s transmission budget
Part2 Spatial Audio Object Coding (SAOC) allows very efficient coding of a multi-channel signal that is a mix of objects (e.g. individual musical instruments)
Part3 Unified Speech and Audio Coding (USAC) combines the tools for speech coding and audio coding into one algorithm with a performance that is equal or better than AAC at all bit rates. USAC can code multichannel audio signals and can also optimally encode speech content
Part4 Dynamic Range Control is a post-processor for any type of MPEG audio coding technology. It can modify the dynamic range of the decoded signal as it is being played

2D/3D Meshes

Polygons meshes can be used to represent the approximate shape of a 2D image or a 3D object. 3D mesh models are used in various multimedia applications such as computer game, animation, and simulation applications. MPEG-4 provides various compression technologies

Table 14 – MPEG 2D/3D-related standards

Standards Part Description
MPEG-4 Part2 Visual provides a standard for 2D and 3D Mesh Compression (3DMC) of generic, but static, 3D objects represented by first-order (i.e., polygonal) approximations of their surfaces. 3DMC has the following characteristics:

1.      Compression: Near-lossless to lossy compression of 3D models

2.      Incremental rendering: No need to wait for the entire file to download to start rendering

3.      Error resilience: 3DMC has a built-in error-resilience capability

4.      Progressive transmission: Depending on the viewing distance, a reduced accuracy may be sufficient

Part16 Animation Framework eXtension (AFX) provides a set of compression tools for Shape, Appearance and Animation

 Face/Body Animation

Imagine you have a face model that you want to animate from remote. How do you represent the information that animates the model in a bit-thrifty way? MPEG-4 Part 2 Visual has an answer to this question with its Facial Animation Parameters (FAP) defined at two levels:

  • High level

o   Viseme (visual equivalent of phoneme)

o   Expression (joy, anger, fear, disgust, sadness, surprise)

  • Low level: 66 FAPs associated with the displacement or rotation of the facial feature points.

In  Figure 41 the feature points affected by FAPs are indicated as a black dot. Other feature points are indicated as a small circle.

Figure 41: Facial Animation Parameters

It is possible to animate a default face model in the receiver with a stream of FAPs or a custom face can be initialised by downloading Face Definition Parameters (FDP) with specific background images, facial textures and head geometry.

MPEG-4 Part 2 uses a similar approach for Body Animation.

Scene Graphs

So far MPEG has never developed a Scene Description technology. In 1996, when the development of the MPEG-4 standard required it, it took the Virtual Reality Modelling Language (VRML) and extended it to support MPEG-specific functionalities. Of course, compression could not be absent from the list. So the Binary Format for Scenes (BiFS), specified in MPEG-4 Part 11 Scene description and application engine was born to allow for efficient representation of dynamic and interactive presentations, comprising 2D & 3D graphics, images, text and audiovisual material.

The representation of such a presentation includes the description of the spatial and temporal organisation of the different scene components as well as user-interaction and animations.

In MPEG-I scene description is playing again an important role. However, MPEG this time does not even intend to pick a scene description technology. It will define instead some interface to a scene description parametres.

Font

Many thousands of fonts are available today for use as components of multimedia content. They often utilise custom design fonts that may not be available on a remote terminal. In order to insure faithful appearance and layout of content, the font data have to be embedded with the text objects as part of the multimedia presentation.

MPEG-4 part 18 Font Compression and Streaming defines and provides two main technologies:

  • OpenType and TrueType font formats
  • Font data transport mechanism – the extensible font stream format, signalling and identification

Multimedia

Multimedia is a combination of multiple media in some form. Probably the closest multimedia “thing” in MPEG is the standard called Multimedia Application Formats. However, MPEG-A is an integrated package of media for specific applications and does not define any specific media format. It only specifies how you can combine MPEG (and sometimes other) formats.

MPEG-7 part 5 Multimedia Description Schemes (MDS) specifies the different description tools that are not visual and audio, i.e. generic and multimedia. By comprising a large number of MPEG-7 description tools from the basic audio and visual structures MDS enables the creation of the structure of the description, the description of collections and user preferences, and the hooks for adding the audio and visual description tools. This is depicted in  Figure 42.

Figure 42: The different functional groups of MDS description tools

Neural Networks

Requirements for neural network compression have been exposed in Moving intelligence around. After 18 months of intense preparation with development of requirements, identification of test material, definition of test methodology and drafting of a Call for Proposals(CfP), at the March 2019 (126th) meeting, MPEG analysed nine technologies submitted by industry leaders. The technologies proposed compress neural network parameters to reduce their size for transmission, while not or only moderately reducing their performance in specific multimedia applications. MPEG-7 Part 17 Neural Network Compression for Multimedia Description and Analysis are the standard name, the part number and the title given to the new standard.

XML

MPEG-B part 1 Binary MPEG Format for XML (BiM) is the current endpoint of an activity that started some 20 years ago when MPEG-7 Descriptors defined by XML schemas were compressed in a standard fashion by MPEG-7 Part 1 Systems. Subsequently MPEG-21 needed XML compress­ion and the technology was extended in MPEG-21 Part 15 Binary Format.

In order to reach high compression efficiency, BiM relies on schema knowledge between encoder and decoder. It also provides fragmentation mechanisms to provide transmission and processing flexibility, and defines means to compile and transmit schema knowledge information to enable decompression of XML documents without a priori schema knowledge at the receiving end.

Point Clouds                    

3D point clouds can be captured with multiple cameras and depth sensors with points that can number a few thousands up to a few billions, and with attributes such as colour, material properties etc.

MPEG is developing two different standards whose choice depends on whether the point cloud is dense (this is done in MPEG-I Part 5 Video-based Point Cloud Compression) or less so (MPEG-I Part 9 Geometry-based PCC). The algorithms in both standards are lossy, scalable, progressive and support random access to subsets of the point cloud.

MPEG plans to release Video-based PCC as FDIS in January 2020 and Geometry-based PCC Point Cloud Compression as FDIS in April 2020.

Sensors/Actuators

In the middle of the first decade of 2000’s, MPEG recognised the need to address compression of data from sensor and data to actuators when it considered the exchange of information taking place between the physical world where the user is located and any sort of virtual world generated by MPEG media.

Therefore MPEG undertook the task to provide standard interactivity technologies that allow a user to

  • Map their real-world sensor and actuator context to a virtual-world sensor and actuator context, and vice-versa, and
  • Achieve communication between virtual worlds.

Figure 43 describes the context of the MPEG-V Media context and control standard.

Figure 43: Communication between real and virtual worlds

Standard Part Description
MPEG-V Part2 Control information specifies control devices interoperability (actuators and sensors) in real and virtual worlds
Part3 Sensory information specifies the XML Schema-based Sensory Effect Description Language to describe actuator commands such as light, wind, fog, vibration, etc. that trigger human senses
Part4 Virtual world object characteristics defines a base type of attributes and characteristics of the virtual world objects shared by avatars and generic virtual objects
Part5 Data formats for interaction devices specifies syntax and semantics of data formats for interaction devices – Actuator Commands and Sensed Information – required to achieve interoperability in controlling interaction devices (actuators) and in sensing information from interaction devices (sensors) in real and virtual worlds
Part6 Common types and tools specifies syntax and semantics of data types and tools used across MPEG-V parts

 

MPEG-IoMT Internet of Media Things is the mapping of the general IoT context to MPEG media developed by MPEG. MPEG-IoMT Part 3 – IoMT Media Data Formats and API also addresses the issue of media-based sensors and actuators data compression.

Genome

The chapter MPEG standards for genomics presents the technology used in MPEG-G Genomic Information Representation. Many established compression technologies developed for compression of other MPEG media have found good use in genome compression. MPEG is currently busy developing the MPEG-G reference software and is investigating other genomic areas where compression is needed. More concretely MPEG plans to issue a Call for Proposal for Compression of Genome Annotation at its July 2019 (128th) meeting.

 

Table of contents 12 Data compression 12.2 Moving intelligence around