Smart Audio Replacement in Live Streams and VOD

Smart Audio Replacement in Live Streams and VOD was one of the significantly challenging and complex projects we’ve completed in the recent years. Audio replacement in live streams and VOD is a technology used for changing the audio compression of a media stream or changing the language of the audio. It can also be used to add a commentary into the existing audio stream.

The following article describes:

  • The technical approach we took for accomplishing the task of audio replacement of live MPEG TS based streams.
  • How to replace the audio compression of live MPEG TS based streams without re-multiplexing the stream.
  • Using the Smart Audio Replacement module to extract, decode, process, encode and insert the audio into the existing MPEG TS, keeping the video, metadata and timing information intact.
  • The technical challenges and solutions for audio replacement, such as packetization, synchronization and buffering.

Overview

Some time ago we were tasked to design and implement a software module for audio replacement of live MPEG TS based streams. 

At that time RTP and UDP streams, both MPEG TS based, were still favored and selected as the main input and output interface for the module. The purpose of audio replacement was to change the audio compression of an existing stream, while keeping the existing multiplex and all of the video, metadata streams and timing information it is carrying intact as much as possible. The MPEG TS could be of MPTS type (Multiple Program Transport Stream) carrying multiple programs with multiple audio streams in each program, where the module could perform the audio replacement on multiple pre-selected audio channels. The module would work for live streams as well as for VOD MPEG TS encoded streams.

Project requirements

The most intuitive approach to audio replacement of live MPEG TS based streams or VOD is a full re-multiplex of the stream. However, due to the specific project requirements in this case, that idea was rejected.

We decided to investigate the ways of direct replacement of the audio into the existing MPEG Transport Stream.

Smart Audio Replacement System

How we tackled the audio replacement

The first part in the workflow is to receive the network stream and extract the pre-selected PID stream, containing the audio, from the MPEG TS. The demultiplex part is accompanied with storing TS packet positions as well as TS and PES header information for each extracted packet. This information later will be used for the generation of the new TS packets, having the new transcoded audio.

Smart Audio Replacement in Live Streams and VOD

To get the encoded audio we go from the MPEG TS Stream layer, then go to MPEG TS packet separation, Transport Stream packet header parsing, Adaptation Field parsing, PES pack extraction, PES payload separation into audio frames, based on the audio header of the used audio compression. The produced series of audio frames are then passed to the Audio Decoder component for producing RAW PCM Audio. In the pure transcoding scenario the audio is not processed but directly passed to the Audio Encoder for encoding into the destination audio compression. But having the audio at this point in raw PCM format, it is possible to process it before passing it to the encoder. Some of the options would be adjusting the audio level, applying audio filters, mixing the audio with commentary or even replacing the audio with entirely new PCM stream for let’s say changing the spoken language.

The processed PCM is then passed to the audio encoder for producing a series of encoded audio frames. The smart multiplex requires to produce TS packets with the new audio, which means that we need to take our way back from audio elementary stream to the MPEG TS packets – generate PES header, based on the original PES header, group the encoded audio frames into a PES packet, split the PES packet into TS packets, having the same TS PID as the original audio, regenerating all TS headers and adaptation fields, again matching the information from the headers in the source TS packets.

MPEG TS Utils used for MPEG Transport Stream analysis and manipulation

We used MPEG TS Utils for analyzing the source SPTS and MPTS streams.  After the audio replacement we used it to compare the original source stream with the manipulated stream.

Possible scenarios in audio replacement

Now that we have the new audio TS packets we need to simply replace the original audio TS packets in the multiplexed TS stream with the new TS packets. The newly encoded audio can have a different bitrate from the original audio, which will result in a different number of audio TS packets.

Here we have three possible scenarios:

  1. We have less audio TS packets than the original stream,
  2. We have the same number of audio TS packets and
  3. We have more audio TS packets than in the original stream.

And the three scenarios need to be handled differently depending on the MPEG TS bitrate mode – Variable Bit Rate (VBR) or Constant Bit Rate (CBR).

VBR or CBR having the same number of audio TS packets

This is the ideal scenario which requires the minimum amount of work.

Simply replace the existing audio TS packets with the new ones. Nothing will change bitrate-wise for the stream.

VBR having less audio TS packets

Replace the existing TS packets with the new ones and remove the redundant original audio TS packets. The TS bitrate will be slightly reduced at this part of the stream.

VBR having more audio TS packets

Replace the existing TS packets with the new ones, where the additional packets need to be added after the last audio TS packet point and the next Program Clock Reference (PCR) in the TS stream. The bit rate will be slightly increased at this part of the TS stream.

CBR having less audio TS packets

Replace the existing TS packets with the new ones and replace the redundant original audio TS packets with NULL TS packets. The TS bitrate won’t change and will remain constant for the stream.

CBR having more audio TS packets

This is the most complex scenario. Being a CBR the source TS stream would have NULL TS packets to sustain the constant bitrate. Now, we can replace the audio TS packets with the new ones and if there is a sufficient amount of NULL TS packets, we can use those to replace with the remaining new audio TS packets.

It is important to always find sufficient NULL TS packets for the remaining new audio TS packets and not to cause audio streams delay, reducing the existing distance between the PCR and PTS/DTS in the stream. If there are not enough NULL TS packets, then simply adding the remaining audio TS packets would cause bitrate spikes, which will break the constant bitrate. The solution here is to slightly increase the overall bitrate of the TS stream, still keeping it constant, which will ensure there is a sufficient amount of NULL TS packets for replacing all new audio TS packets. The bitrate increasement would mainly depend on the ratio between the bitrate of the old audio compressed stream vs the new one. The bitrate increase would of course include CBR/VBR detection and bitrate measurement in the source stream and a small CBR TS re-multiplex in the Smart Muxer module. And now there is enough room for replacing the old audio TS packets with the new ones, without shifting the audio elementary stream in any direction.

Unfortunately that’s not all. The module would allow changing of the audio codec, which means update of the Program Map Table (PMT) in the TS stream, retaining or updating the existing descriptors. A small module would take care of updating the PMT tables with the new compression.

The software was designed to work with continuous MPEG-TS based streams, being sourced from a file, UDP, RTP or any other interface providing the supported TS format. 

Audio replacement in ABR streams

The standard HLS is based on MPEG Transport Stream, where in its Adaptive Bit Rate (ABR) version it delivers the video and audio streams in different resolutions and bitrates. The audio replacement in HLS is applied to all or pre-selected individual audio streams.

Smart Audio Replacement in HLS

In conclusion

Now you may ask, was it worth taking the above approach instead of the standard method, which involves fully demultiplexing and multiplexing all the streams and replacing the audio. We believe it did as it preserved all the quality features of the source MPEG Transport Stream as well as it kept the processing to its minimum for the given task, making the solution easily scalable.

Tsviatko Jongov

About the author

Tsviatko Jongov is a specialist in the area of digital multimedia technologies with more than 20 years of experience in the field. He started his career as a software developer, designing and building software solutions for the television broadcast industry, serving broadcasters around the globe. In 2008 he started Jongbel Media Solutions software company providing digital media analysis solutions for customers such as HBO, Microsoft, Adobe, SpaceX, Dolby and more.

Share this post: