AAC is firmly established in all multimedia markets today. There are currently over seven billion devices using AAC support. AAC is the surround sound for digital radio and TV broadcasts such as BBC, NHK, TV Globo, HbbTV. Apple, Google, Pandora and still more companies use it for streaming applications. MPEG AAC is enabling surround sound in HTML5 browsers such as IE9, Firefox, Safari and Chrome. Furthermore, MPEG AAC is part of many global application standards for content, streaming, DTV & radio broadcast and CE device connectivity such as ARIB, ATSC Mobile, HbbTV, OIPF, DMB and DRM, DVB, 3GPP etc.
The most common application areas are:
MPEG AAC is an established digital radio and TV broadcast codec deployed in the EU, South America and Japan.
It is mandatory in ISDB-T and part of DVB, it is the only mandatory codec for stereo and surround in HbbTV, and it
supports a full set of metadata to comply with the CALM Act and EBU R128.
MPEG AAC supports up to 48 audio channels, including 5.1, 7.1 and 22.2, and also well-defined downmix parameters. This performance has resulted in HDTV deployments worldwide and in AAC becoming the predominant audio codec in broadcast and streaming.
- DVB-S,-T,-C,-H and the follow-up version -S2,-T2,-C2
- ATSC M/H (Mobile Handheld) and NRT (None Real Time)
- SBTVD (brasilian version of the japanese ISDB standard)
- ISDB MPEG-2 AAC, MPEG-2 AAC+SBR for 1-seg Mobil TV System
High quality for stereo and multichannel audio
- Audio Compression Landscape (Figure courtesy of Werner Oomen, Philips) based on "EBU – TECH 3324: EBU Evaluations of Multichannel Audio Codecs"
Audio Specific Metadata
Audio-related metadata for broadcasting systems are typically generated at some point in the content production chain or are part of a pre-encoded delivery. The metadata can be conveyed alongside the coded audio. Three features of metadata are of particular importance and are frequently referred to as the "3 Ds":
Dialogue Normalization is used to adjust and achieve a constant long-term average level of the main program components across various program materials, e.g. a feature film interspersed by commercials. Dynamic Range Control (DRC) facilitates control of the final dynamic range of the audio, and adjusts compression to suit individual listening requirements. Downmix maps the channels of a multi-channel signal to the user's mono or two-channel stereo speaker configuration.
These terms come from the metadata parameters defined for the AC-3 audio codec, which are used for emission in some digital television systems. They also relate to the Dolby® E audio codec, which is used in the broadcast production and contribution chain.
The AAC codec supports these same features and goes beyond by adding more advanced metadata features. The naming convention is slightly different and the following table compares the Dolby® nomenclature of the parameters listed above with their equivalents in the AAC codec.
|Loudness Normalization||"Program Reference Level"||"Dialnorm"|
|Dynamic Range Control|
|"Light Compression"||"Dynamic Range Control"||"Line Mode"|
|"Heavy Compression"||"compression value"||"RF Mode"|
The metadata in the (E-) AC-3 format can be translated into the AAC metadata format and vice versa, allowing a seamless integration of the AAC codec into a production chain using Dolby-E or an environment with AC-3 pre-encoded delivery.
xHE-AAC for Digital Radio Mondiale
Extended HE-AAC (xHE-AAC), the latest upgrade to the MPEG AAC standard, is the first MPEG audio codec to combine
speech and general-purpose audio coding in a unified system. This allows for high quality delivery of any type of audio content at virtually any bit rate.
Digital Radio Mondiale (DRM) is the first global broadcast standard to adopt xHE-AAC.
xHE-AAC provides an improved audio quality for mixed-signal content at ultra-low bit rates as used in DRM AM bands. At the same time xHE-AAC is a superset of the MPEG AAC codec, that is used for all DRM transmissions so far and will remain available as part of the DRM standard.
DRM broadcasters also benefit from a simplified codec configuration process: All quality relevant parameters are automatically optimized internally by the encoder, and the need to change configuration settings depending on the type of audio content being broadcasted is eliminated.
A major shift in audio and video streaming techniques has emerged with the introduction of the ISO standard for Dynamic Adaptive Streaming over HTTP (DASH), also known as MPEG-DASH. MPEG-DASH delivers high quality streaming multimedia content over the Internet through conventional HTTP servers. It adapts seamlessly to changing network conditions, which eliminates buffering experiences that frustrate viewers. MPEG-DASH was designed to help the industry overcome market fragmentation caused by proprietary adaptive streaming technologies. Even though the standard has just recently been finalized, Microsoft Internet Explorer 11 and Google Chrome already support the secure delivery of MPEG-DASH content and many industry players and SDOs have announced upcoming product and standard support for DASH.
The DASH Industry Forum (DASH-IF), an industry body that promotes the adoption of MPEG-DASH, recently published the DASH-AVC/264 implementation guidelines that make HE-AAC the only mandatory stereo audio codec and an optional audio codec for multichannel sound. HE-AAC is the ideal choice for DASH due to its encoding efficiency that allows for seamless audio bit rate switching and surround sound streaming, without the need to switch to stereo when bandwidth is constrained.
Many applications' standard bodies, such as HbbTV, DVB and ATSC plan to adopt MPEG-DASH and with it the HE-AAC audio codec. Since HE-AAC is already used in most of the traditional TV broadcast standards, using it for IP based delivery of premium content is a natural fit and reduces costs for manufactures and broadcasters/service providers.
In mobile audio streaming, HE-AAC is the de-facto standard not only since it had been selected by 3GPP as the standard for music delivery. It's supported by all mobile operating systems, media player platforms and the majority of HTML5 browsers, which makes it ubiquitous and an obvious choice for service providers.
The AAC-ELD codec is the state-of-the-art MPEG-4 audio codec for maximum speech and audio quality with very low coding delay. It is a successful codec within the communication market, and is used in Apple's Facetime application, within video conferencing/Telepresence systems by Cisco, Tandberg or Polycom, and for broadcast contribution devices by Telos.
Major operating systems such as iOS, Android or Mac OS, and important international standards such as TIP, ETSI/Dect, OIPF, N/ACIP include low delay versions of AAC.
By supporting the full audio bandwidth of 20 kHz, AAC is able to deliver Full-HD Voice audio quality to IP-communication applications and devices.
Over-the-Top Services and IP Video Telephony
To overcome the limitations of speech codecs, Apple's OTT peer-to-peer video telephony service, FaceTime, is based on the Full-HD Voice codec, AAC-ELD. As FaceTime is available on most Apple devices, such as iPhone, iPad, and Mac, the service can be used today on more than 200 million devices and that number is growing rapidly.
Video Conferencing and Telepresence
For video conferencing and telepresence services, the user has demanding expectations for both video and audio quality. As a result, Full-HD Voice has long been a default choice for providers. Most companies are offering Full-HD Voice, and a majority of these products are based on the TIP standard, assuring interoperability between devices from different manufacturers. The TIP standard chose the AAC-LD as the only mandatory codec besides G.711, which is the legacy voice codec used in narrowband telephony.
Low Delay codecs such as AAC-ELD can be a vital asset because they provide the low bit rate capabilities and high-quality error concealment required to handle live contribution over IP networks. As a result, AAC-ELD is already widely deployed in broadcast contribution devices such as the Telos Zephyr/IP and the Comrex Access.
Telepresence at Home
The low delay audio codecs can provide the basis for comprehensive solutions integrated into broadband connected devices, including PCs, TVs, set-top boxes and mobile phones. Staying in touch should be as natural as having a face-to-face conversation. Fraunhofer's low delay audio codecs allow service providers and hardware manufacturers to make this goal a reality.
OTT voice over Long Term Evolution (LTE)
LTE requires the deployment of all-IP voice services or Voice-over-LTE (VoLTE), which opens up the prospect of eventually phasing out the legacy-switched services based on GSM, UMTS, CDMA networks, and wired public switched telephone networks by moving all voice services onto IP networks,. The development of Full-HD Voice makes it possible for service providers to shake off the limitations of these legacy services, including the very limited audio bandwidth and the use of speech codecs.
AAC Features for Communication Applications: Quality
- (Source: Deutsche Telekom, 2010 [AES 129, Ulf Wüstenhagen et. all, "Evaluation of Super-Wideband Speech and Audio Codecs".])
AAC Features for Communication Applications: Algorithmic delay
- This graph gives an overview of the resulting algorithmic delay for different audio codecs running at typical bit rates.
(Source: Fraunhofer IIS)