Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine if an audio track is a Dolby Pro Logic II mixdown

I'm trying to find out if there's a way to determine if an AAC-encoded audio track is encoded with Dolby Pro Logic II data. Is there a way of examining the file such that you can see this information? I have for example encoded a media file in Handbrake with (truncated to audio options) -E av_aac -B 320 --mixdown dpl2 and this is the audio track output that mediainfo shows:

Audio #1
ID                                       : 2
Format                                   : AAC
Format/Info                              : Advanced Audio Codec
Format profile                           : LC
Codec ID                                 : 40
Duration                                 : 2h 5mn
Bit rate mode                            : Variable
Bit rate                                 : 321 Kbps
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 KHz
Compression mode                         : Lossy
Stream size                              : 288 MiB (3%)
Title                                    : Stereo / Stereo
Language                                 : English
Encoded date                             : UTC 2017-04-11 22:21:41
Tagged date                              : UTC 2017-04-11 22:21:41

but I can't tell if there's anything in this output that would suggest that it's encoded with DPL2 data.

like image 530
WheresWardy Avatar asked Apr 13 '17 09:04

WheresWardy


People also ask

What is the difference between Dolby Pro Logic II?

The surround channel consists of two speakers, and in Pro-Logic the same sound comes from both speakers. Pro-Logic is typically an analog system. Dolby Digital is a discrete digital system offering six separate channels of sound with a dedicated subwoofer channel for deep bass.

What is Dolby Digital Pro Logic II?

Dolby Pro Logic II DPL II processes any high-quality stereo signal source into five separate full frequency channels (right front, center, left front, right rear and left rear), while also decoding five channels from stereo signals encoded in traditional four-channel Dolby Surround.

What is Dolby Pro Logic mode?

Dolby Pro Logic is a decoding technology capable of transforming stereo input signals into surround sound playback. The codec became available for consumer use in 1982, spurred by the popularity of video and laser disc rentals. It allowed a soundtrack recorded for stereo to be played back in surround sound.


2 Answers

tl:dr; it's probably possible; it may be easier if you're a programmer.

Because the information encoded is just a stereo analog pair, there is no guaranteed way of detecting a Dolby Pro Logic II (DPL2) signal therein, unless you specifically store your own metadata saying "this is a DPL2 file." But you can probably make a pretty good guess.

All of the old analog Dolby Surround formats, including DPL2, store surround information in two channels by inverting the phase of the surround or surrounds and then mixing them into the original left and right channels. Dolby Surround type decoders, including DPL2, attempt to recover this information by inverting the phase of one of the two channels and then looking for similarities in these signal pairs. This is either done trivially, as in Dolby Surround, or else these similarities are artificially biased to be pushed much further to the left or right, or the left or right surround, as in DPL2.

So the trick is to detect whether important data is being stored in the surround channel(s). I'll sketch out for you a method that might work, and I'll try to express it without writing code, but it's up to you to implement and refine it to your liking.

  1. Crop the first N seconds or so of program content into a stereo file, where N is between one and thirty. Call this file Input.
  2. Mix down the Input stereo channels to a new mono file at -3dB per channel. Call this file Center.
  3. Split the left and right channels of Input into separate files. Call these Left and Right.
  4. Invert the right channel. Call this file RightInvert.
  5. Mix down the Left and RightInvert channels to a new mono file at -3dB per channel. Call this file Surround.
  6. Determine the RMS and peak dB of the Surround file.
  7. If the RMS or peak DB of the Surround file are below "a tolerance", stop; the original file is either mono or center-panned and hence contains no surround information. You'll have to experiment with several DPL2 and non-DPL2 sources to see what these tolerances are, but after a dozen or so files the numbers should become clear. I'm guessing around -30 dB or so.
  8. Invert the Center file into a new file. Call this file CenterInvert.
  9. Mix the CenterInvert file into the Surround file at 0 dB (both CenterInvert and Surround should be mono). Call this new file SurroundInvert.
  10. Determine the RMS and peak dB of the SurroundInvert file.
  11. If either the RMS and/or peak dB of SurroundInvert are below "a tolerance," stop; your original source contains panned left or right front information, not surround information. You'll have to experiment with several DPL2 and non-DPL2 sources to see what these tolerances are, but after a dozen or so files the numbers should become clear -- I'm guessing around -35 dB or so.
  12. If you've gotten this far, your original Input probably contains surround information, and hence is probably a member of the Dolby Surround family of encodings.

I've written this algorithm out such that you can do each of these steps with a specific command in sox. If you want to be fancier, instead of doing the RMS/peak value step in sox, you could run an ebur128 program and check your levels in LUFS against a tolerance. If you want to be even fancier, after you create the Surround and Center files, you could filter out all frequencies higher than 7kHz and do de-emphasis on them, just like a real DPL2 decoder would.

To keep this algorithm simple, I've sketched it out entirely in the amplitude domain. The calculation of the SurroundLevel file would probably be a lot more accurately done in the frequency domain, if you know how to calculate the magnitude and angle of FFT bins and you use windows of 30 to 100 ms. But this cheapo version above should get you started.

One last caution. AAC is a modern psychoacoustic codec, which means that it likes to play games with stereo phasing and imaging to achieve its compression. So I consider it likely that the mere act of encapsulating DPL2 into an AAC stream will likely hose some of the imaging present in DPL2. To be candid, neither DPL2 nor AAC belongs anywhere in this pipeline. If you must store an analog stream originally encoded with DPL2, do it in a lossless format like WAV or FLAC, not AAC.

As of this writing, operational concepts behind Dolby Pro Logic (I) are here. These basic concepts still apply to DPL2; operational concepts for DPL2 are here.

like image 69
johnwbyrd Avatar answered Oct 08 '22 08:10

johnwbyrd


If the file has more than one channel, you can with some certainty assume that they are used for surround purposes, although they could be just multiple tracks. In this case it falls on a playing system to do with channels as it "thinks" best. (if file header doesn't say what to do)

But your file is stereo. If you want to know whether it is a virtual surround file then you look in header for an encoder field to see which encoder was used. This may help somewhat, although not much. Mostly encoder field is left empty, and second thing is that the encoder doesn't have to be same as the recoder that mixed down the surround data. I.e. the recoder will first create raw PCM data, then feed it to some encoder to produce compressed file. (AAC or whatever) Also, there are many applications and versions vary, so might the encoder field, so tracking all of them would be nasty work.

However, you can, with over 60% certainty, deduce whether something is virtual surround or not by examining the data. This would be advanced DSP and, for speed, even machine learning may be involved. You would have to find out whether the stereo signals contain certain features of HRTF (head related transfer function). This may be achieved by examining intensity difference and delay features between same sounds appearing in time domain and harmonic features (characteristic frequency changes) in frequency domain. You would have to do both, because one without another may just tell you that something is very good stereo recording,, not a virtual surround. I don't know whether there are HRTF specific features mapped somewhere already, or you would need to do it by yourself.

It's a very complicated solution that takes a lot of time to make properly. Also it's performance would be problematic.

With this method you can also break the stereo mixdown to the nearly original surround channels. But for stereo to surround conversion other methods are used and they sound well.

If you are determined to perform such a detection, dedicate half a year or more of hard work if no HRTF features are mapped, few weeks if they are, brace yourself for big stress and I wish you luck. I have done something similar. It is a killer.

If you want an out of the box solution, then the answer to your question is no, unless header provides you with encoder field and the encoder is distinctive and known to be used only for doing surround to stereo conversion. I do not think anyone did this from actual data as I described, or if they did it is a part of commercial product. Doing what you want is not usually needed, but it can be done.

Ow, BTW, try googling HRTF inversion, it might give some help.

like image 30
Dalen Avatar answered Oct 08 '22 09:10

Dalen