Module 1.2 – Beyond Multichannel Audio
Topics covered in this article:
- Dolby Atmos concepts
- Bed audio
- Object audio and Object metadata
- The difference between theatrical and home Dolby Atmos
- Spatial coding
- Single and multiple Bed workflows
- Loudness measurement with Dolby Atmos
- Dolby Atmos master formats
- Delivery specifications
Dolby Atmos is an immersive audio format that provides a compelling listener experience in the cinema, at home, and even on mobile devices. This course is intended to familiarize you with the Dolby Atmos content creation workflow using a digital audio workstation (DAW), the Dolby Atmos Renderer software, and other components.
Before diving into the workflow, it is important to understand key Dolby Atmos concepts and how Dolby Atmos is delivered to consumers. That is what this article is all about.
With traditional multichannel audio, channels correspond to specific speaker locations. If, for example, the mixer wants the sound to come from the Left Surround speakers, they bus or pan the audio to the left surround channel. If a mixer wants the audio to appear as though it is coming from between the Left Surround and Left speakers, they pan between the two to create a phantom image. Traditional multichannel audio is utilized in Dolby Atmos; these channels are referred to as “Bed” audio. Bed configurations can range in width from Stereo to 7.1.2. The 7.1.2 nomenclature denotes seven ear-level channels, one LFE (low frequency effects) channel, and left and right overhead channels.
Object Audio and Object Metadata
In addition to Bed audio, Dolby Atmos introduces the concept of Object audio (also referred to as audio Objects). Object audio is not bussed or panned directly to (or between) channels. Instead, Object audio is captured with panning information recorded as X, Y, and Z coordinates (the Z-axis being elevation or height). These X, Y, and Z coordinates, along with Object size, are recorded along with the audio as Object Audio Metadata (OAMD). OAMD is dynamic and is updated with every panning move as the Dolby Atmos Renderer is synchronized with the DAW.
Object audio and metadata are maintained separately from the Bed audio. In the end-user's device OAMD is used to render the Object in the correct position for the individual device’s speaker configuration and capability. This allows for the ability to address individual speakers in a configuration to increase panning resolution beyond what is possible in a discrete channel-based system.
Mixing and recording Bed and Object audio along with OAMD is how Dolby Atmos content is created.
The Dolby Atmos Renderer captures up to 128 tracks of audio. The first 10 tracks are dedicated to capture Bed audio with a width up to 7.1.2, and the remaining 118 inputs can be used for Objects and/or additional Beds. Additional Beds may be used to facilitate multiple DAW systems working on different stems (i.e., Dialog, Music, Effects), and to simplify workflows to derive channel-based stems. Increasing the number of Beds reduces the number of tracks available for Object audio. More on this later.
Object Audio Renderer
When mixing in Dolby Atmos, the suggested speaker configuration is 7.1.4 (seven ear-level speakers, one subwoofer, and four overhead speakers – Left Top Front, Right Top Front, Left Top Rear, and Right Top Rear).
However, it is important to understand that the Dolby Atmos Renderer translates (renders) the mix created from Beds and the Objects with OAMD to real world speaker layouts that can in fact be configured a number of ways. In professional mixing, this can range from binaural audio over headphones to speaker layouts of 7.1.4 and beyond.
For the listener at home, consumer devices can range from virtualized Dolby Atmos on phones/tablets using speakers or headphones, to TV speakers, soundbars, and discreet speaker systems using overhead or upward firing speakers. Consumer speaker system layouts range typically from 5.1.2 to 9.1.6 and beyond.
When using the Dolby Atmos Renderer during mixing and for Dolby Atmos Master File playback, an Object Audio Renderer (OAR) is used. The OAR “renders” the Bed to the available speakers and the Object audio to spatial coordinates supplied by the OAMD. This OAR “render” uses the available speaker configuration with a resolution beyond what is possible in a discrete channel-based system.
The OAR is another central concept to Dolby Atmos. It is used during the creation of Dolby Atmos content, and in playback of Dolby Atmos content for consumers.
The recording of all Bed audio, Object audio, and OAMD, and the reproduction of Dolby Atmos audio with the OAR, are the essence of Dolby Atmos. (The recorded audio mix is sometimes called the Printmaster or Master File.)
The Difference Between Theatrical and Home Dolby Atmos
As mentioned previously, the Dolby Atmos Renderer can record up to 128 tracks, including 10 channels of Bed audio and up to 118 tracks of Object audio and associated OAMD. In the theater, the Dolby Atmos master is played back from a Digital Cinema Package off a Digital Cinema Server. Up to 128 tracks and OAMD are used by the OAR in the Dolby Cinema Processor to render Bed and Object audio for to up to 64 discrete speakers. This creates a very full immersive audio experience for movie viewers.
Due to bandwidth constraints with over-the-top (OTT) streaming delivery and file size restrictions on Blu-ray, it is not practical to deliver the full Dolby Atmos master of up to 128 tracks and OAMD to the home listener. Nor is it practical for the OAR in consumer equipment to have the processing power required for a full Atmos presentation.
To deliver Dolby Atmos to the home, two other core concepts are introduced to reduce bit rate, file size, and complexity while preserving the artistic intent of the original mix and providing a full home immersive audio experience. These concepts are Spatial Coding and the use of delivery Codecs.
Spatial coding provides a way to reduce a full Dolby Atmos presentation to a more reduced data set.
Spatial coding reduces the Atmos presentation:
- From: Up to 128 tracks with OAMD for up to 118 Objects
- To: 12, 14, or 16 elements and OAMD
Spatial coding is the first step in preparing a Dolby Atmos mix for home delivery.
Spatial coding is a process that dynamically groups nearby audio from Beds and Objects using loudness and positional algorithms into “elements” (sometimes called clusters) that contain their own OAMD. The elements themselves can move over time, and the Bed and Object audio can move between elements to more accurately reflect their position and trajectory. Below is a graphical representation of the spatial coding process.
While there may be up to 128 tracks in a Dolby Atmos presentation, the tracks are rarely all active at the same time. Even with complex and frenetic mixes, the dynamic elements produced by the spatial coding process provide the spatial resolution for the OAR to recreate an immersive soundfield. With the reference mix speaker configuration of 7.1.4 and common home theater speaker configurations up to 9.1.6 and beyond, the spatial coding process is transparent for most content.
Spatial Coding Emulation
The spatial coding process takes place as part of the encoding process, downstream from post-production. While spatial coding is most often transparent, it can be audible with some content, depending on the number of elements used.
Spatial Coding Emulation is a feature of the Dolby Atmos Renderer that allows the mixer to audition what spatial coding sounds like prior to the encoding process. As spatial coding is part of the delivery of Dolby Atmos to the home, it is important that the mixer be satisfied with the results and make adjustments to the mix if needed. Spatial coding emulation doesn’t need to be on during initial sound design and editing, but it should be turned on as the mix comes together.
Spatial coding can be emulated with 12, 14, or 16 elements. It is important to understand the final delivery method(s) in order to monitor appropriately, as the number of elements that can be included will vary depending on the codec and bit rate. See appendix A for a more in-depth discussion of Dolby codecs used for the delivery of Dolby Atmos to consumers.
Note that spatial coding is not used in binaural rendering.
In addition to recording to the Dolby Atmos Master File and exporting to other mastering formats, the Dolby Atmos Renderer can also be used to output legacy channel-based deliverables. These 're-renders' are generated by the OAR. Re-renders can be output in real time (assigned to specific hardware outputs) and recorded back into a DAW or exported offline.
Re-renders can range in width from Stereo to 9.1.6 as well as Ambisonic formats. The most common output widths for repurposing content created in Dolby Atmos are Stereo, 5.1, 7.1, and 7.1.4 (for use with Gaming engines such as Wwise and Unity). Re-renders can contain the full mix or can be customized and derived from input groupings.
Single and Multiple Bed Workflows
The first 10 inputs to the Dolby Atmos Renderer are reserved for Bed inputs. Bed audio can range in width from Stereo to 7.1.2. The remaining 118 inputs to the Renderer can be used for either Bed audio or Object audio.
Within the DAW, mixers often group similar audio together into stems. In post-production work, the stems typically include Dialog, Music, and Effects. Other types of stems might include Narration, Foley, etc. In music production, stems could include Drums, Guitars, Keyboards, Vocals, etc. When working in Dolby Atmos, mixers can use multiple Bed tracks (one or more) for each stem.
Bed tracks can be summed/combined to create a single composite Bed that is output from the DAW and feeds Renderer inputs 1-10, leaving the rest of the Renderer inputs available for Object audio.
If a mixer is required to generate channel-based stems from a Dolby Atmos mix that are used to prepare audio for non-Dolby Atmos delivery (for example a 5.1 Music and Effects stem and a separate 5.1 Dialog stem), the mixer must selectively mute tracks in the DAW and perform multiple mastering passes.
Another approach is to use a Multiple Bed Workflow. In this workflow, each stem has a dedicated Bed in the Dolby Atmos mix, each with dedicated outputs from the DAW and corresponding inputs to the Renderer.
Within the Renderer, the Bed and Objects for a given stem can be grouped so that it is possible to generate channel-based stems without the need to mute tracks in the DAW or perform multiple passes. This can be a huge time-saver. The tradeoff is that this leaves fewer Renderer inputs available for Object audio.
This approach is also essential for workflows with multiple DAWs each working on a specific stem, where each DAW is assigned a set of Renderer inputs for its stem Bed and Objects.
The use of a Multiple Bed Workflow along with Re-renders is part of the Master Once/Deliver Everywhere premise of Dolby Atmos. Starting a mix in Dolby Atmos ensures universal compatibility by providing for the delivery of assets that are not in Dolby Atmos format. The use of different downmix modes and trims allows for great flexibility to ensure that channel-based deliverables sound as good as or better than a bespoke channel-based mix.
Loudness Measurement with Dolby Atmos
Loudness measurement is not performed on a full Dolby Atmos mix but instead on a 5.1 re-render. This is done for two reasons. First, there isn’t an effective way to measure the loudness of an entire Dolby Atmos presentation. Second, and more importantly, this practice ensures loudness continuity between Dolby Atmos content and content that is not mixed or presented in Dolby Atmos. A home viewer watching a movie in Dolby Atmos who switches back to 5.1 content should not experience a shift in loudness.
Loudness measurement can be performed using both the real-time and offline loudness measurement built into the Dolby Atmos Renderer. Alternately, a 5.1 re-render can be generated and measured in loudness apps and DAW plug-ins.
Delivery specifications vary, but –23 LUFS, –24LKFS, and –27LKFS are common loudness targets.
True Peak (dBTP) targets are difficult to achieve with Dolby Atmos content, even if limiting is used in the DAW session. Rendering to 5.1 involves unpredictable summing. Additionally, the nature of True Peak measurements are interpolative.
While True Peak limits are commonly specified in delivery requirements when working in 5.1, when working in Dolby Atmos, True Peak specifications should be a target value. As long as the dBTP measurements aim for –2dBTP and do not exceed –.1dBTP, the limiter that is used in the encoding process will be sufficient to prevent audible clipping.
If True Peak limits must be met for channel-based Re-renders, the re-renders will need to be limited as an additional processing step in the DAW.
The loudness measurement tool in the Dolby Atmos Renderer has a soft clip limiter inline that mimics the encoding process. If using external loudness measurement applications or plug-ins, use the “loudness” 5.1 Re-render that has this limiter applied to get consistent measurements.
Loudness measurement is also performed during the encoding process and the dialnorm metadata value is set according to the measured value.
Dolby Atmos Master Formats
The Dolby Atmos Renderer records up to 128 inputs comprised of Bed and Object audio, OAMD as well as Binaural, Downmix, and Trim metadata, Input and Re-render configurations.
These are recorded to a Dolby Atmos Master File set (DAMF). This is the format native to the Dolby Atmos Renderer and is recorded as a three-file set comprised of:
- .atmos — An XML file containing information about the Dolby Atmos presentation and index information about the other files in the file set. The .atmos file includes the number of inputs used as Beds or Objects, frame rate, file start, first frame of action, the number of elements used in spatial coding, downmix, and trim metadata.
- .atmos.metadata — An XML file containing dynamic positional and size OAMD for each Object, along with binaural metadata settings.
- .atmos.audio — A Core Audio Format (CAF) file of up to 128 tracks of interleaved audio.
The .atmos and .atmos.metadata files can be opened for inspection with a text editor. However, direct editing of these files is not recommended, as the file set can become corrupted.
While a new master is always recorded as a DAMF, two other formats are used for distribution, for encoding, or further editing:
- ADM BWF — The Audio Definition Model Broadcast Wav Format (ADM BWF) is an alternative Dolby Atmos master format. With ADM BWF (sometimes referred to as ADM BWAV), all other information included in the .atmos and .atmos.metadata files is included in a data chunk in the header of the wav file. The audio payload itself is up to 128 tracks of interleaved audio. ADM has several advantages:
- ADM BWF is a single file instead of three files in a folder, making it easy to interchange with other facilities.
- ADM BWF can be imported into DAWs. This allows all Bed and Object audio tracks to be recreated along with all the panning metadata. This allows for subsequent editing — language replacement, timing conformance, censorship edits, etc. — prior to remastering.
- ADM BWF can be encoded to Dolby True HD, Dolby Digital Plus JOC, and Dolby AC-4 IMS and is the primary deliverable to streaming operators and Blu-ray authoring.
- IMF.IAB – Immersive Audio Bitstream is a mezzanine format for IMF (interoperability mastering format). IAB is considered a mezzanine format rather than a master format, as OAMD is quantized. IAB.mxf is used by third-party IMF packaging tools to create a delivery container for both Dolby Atmos and video (including Dolby Vision).
While the Dolby Atmos Renderer natively records in the .atmos format only, it can convert to and export ADM BWF and IAB.MXF. The entire file can be exported, or basic top/tail (specified range) edits can be performed.
The Dolby Atmos Renderer can also open ADM BWF and IAB.MXF files as master files for playback, QC, basic top/tail editing, conversion (between the two formats), and re-export. However, some restrictions apply with open ADM BWF and IAB.MXF. Punch-in and other metadata editing are not permitted with ADM BWF and IAB.MXF. Conversion to .atmos from ADM BWF and IAB.MXF is not permitted.
The Dolby Atmos Conversion Tool (DACT) is a companion application to the Dolby Atmos Renderer and is required to convert from ADM BWF and IAB.MXF to .atmos, perform format and frame-rate conversions, as well as perform complex editing operations on master files. The Dolby Atmos Conversion Tool is a free utility.
Technical Delivery Specifications
The deliverables required by streaming services are spelled out in technical delivery specifications. These vary in terms of loudness and peak target, the number and format of master files and channel-based deliverables, as well as naming conventions and more. Some specifications ask for Pro Tools sessions along with ADM BWF for archival purposes. Being aware of the deliverables required is crucial to achieving an efficient workflow.
Previous: Module 1.1 – Module Learning Objectives