The Immersive Audio Model and Formats
(IAMF) encoder within iamf-tools is designed to create IAMF bitstreams which
are compliant with the IAMF standard, as defined by the Alliance for Open
Media (AOM).
The IamfEncoderInterface encodes an IAMF bitstream based on various
configuration settings and input audio. It supports encoding on a frame-by-frame
basis. This is especially useful for streaming applications.
IamfEncoderFactory provides functions to create an encoder.
CreateIamfEncoderis useful to directly stream IAMF content.CreateFileGeneratingEncoderis useful to automatically produce a.iamffile.
Either creation mode is configured by a UserMetadata. Many properties of the
IAMF bitstream can be changed. See
Textproto templates for common
configurations.
In either case, the encoder is used in a similar manner. The below code uses a
streaming variable to differentiate these cases.
Prepare the encoder and any buffers which can be reused between frames.
// Get an encoder.
std::unique_ptr<IamfEncoderInterface> encoder = ...;
// Reusable buffer, later redundant copies won't change size.
std::vector<uint8_t> descriptor_obus;
// When writing directly to a file `CreateFileGeneratingEncoder`, descriptor
// OBUS are automatically managed. When streaming `CreateIamfEncoder`,
// descriptor OBUs should be broadcast periodically to help downstream clients
// synchronize to a stream.
bool streaming = ...;
if (streaming) {
bool kNoRedundantCopy = false;
// Broadcast the first set of descriptor OBUs, to allow consumers to sync.
// Using `CreateFileGeneratingEncoder` automatically handles descriptors. If
// not streaming it is safe to skip the call.
RETURN_IF_NOT_OK(
encoder->GetDescriptorObus(kNoRedundantCopy, descriptor_obus));
}
// Reusable input buffer. Hardcode an LFE and stereo audio element to show
// multi-dimensionality. Hardcode 1024 samples per frame. The same
// backing allocation can be used for each frame.
using ChannelLabel::Label;
const absl::Span<const double> kEmptyFrame;
IamfTemporalUnitData temporal_unit_data = {
.parameter_block_id_to_metadata = {},
.audio_element_id_to_data =
{
{0, {{LFE, kEmptyFrame}}},
{1, {{L2, kEmptyFrame}, {R2, kEmptyFrame}}},
},
};
// Reusable buffer. It will grow towards the maximum size of an output
// temporal unit.
std::vector<uint8_> temporal_unit_obus;
// Repeat descriptors every so often to help clients sync. In practice,
// an API user would determine something based on their use case. For
// example to aim for ~5 seconds of output audio between descriptors.
const int kDescriptorRepeatInterval = 100;Process temporal units until this is no more input audio.
while (encoder->GeneratingTemporalUnits()) {
if (streaming && iteration_count % kDescriptorRepeatInterval == 0) {
bool kRedundantCopy = true;
// Broadcast the redundant descriptor OBUs.
RETURN_IF_NOT_OK(
encoder->GetDescriptorObus(kRedundantCopy, descriptor_obus));
}
// Fill `temporal_unit_data` for this frame.
for (each audio element) {
for (each channel label from the current element) {
// Fill the slot in `temporal_unit_data` for this audio element and
// channel label.
}
}
// Fill any parameter blocks.
for (each parameter block metadata) {
// Fill the slot in `temporal_unit_data` for this parameter block.
}
RETURN_IF_NOT_OK(encoder->Encode(temporal_unit_data));
if (done_receiving_all_audio) {
encoder->FinalizeEncode();
}
// Flush OBUs for the next temporal unit.
encoder->OutputTemporalUnit(temporal_unit_obus);
if (streaming) {
// Broadcast the temporal unit descriptor OBUs.
}
// Otherwise, the temporal units were automatically flushed to the file.
}// Data generation is done. Perform some cleanup.
if (streaming) {
bool kNoRedundantCopy = false;
RETURN_IF_NOT_OK(
encoder->GetDescriptorObus(kNoRedundantCopy, descriptor_obus));
// If any consumers require accurate descriptors (loudness), notify them.
}
// Otherwise, they were flushed to file after the last temporal unit was output.