Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is sample accurate decoding from an arbitrary FLAC seekpoint possible?

Is FLAC decoded PCM guaranteed to start at the sample of an arbitrary seekpoint, if we fetch via HTTP range request from the seekpoints noted offset (on a fixed frame size e.g. 1024)?

I encounter missing samples at the start.

Broader context of my problem: I want to build a streaming app.

I do fetch 3 *.flac files (with 4, 5 and 7 channels), decode them, then stitch the 16 channels together for later Ambisonics decoding. They have to be sample accurate. I already know that I have to feed more data then seekpoint_A_offset to seekpoint_B_offset in order to guarantee the minimum length to B, but what is too much can be cut, we know how much samples we need by the parsed SEEKTABLE.

However I have to be certain of the fact that the decoder started on seekpoint_A_sample.

Can you confirm? (otherwise the error is somewhere else in my signal chain)

My toolchain:

1. $ ffmpeg                        # for splitting the original 16 channel audio file
2. $ flac --compression-level-8    # for encoding 
          --blocksize=1024
          --force-raw-format
          --endian=little
          --sign=signed
          --channels=7
          --bps={bits_per_sample}
          --sample-rate={sample_rate}
          --no-seektable
3. $ metaflac --dont-use-padding --add-seekpoint=2530x    # for adding seekpoints
                                                          # that are sample synchronous
                                                          # acros files
4. HTTP range requests to fetch parts of the 3 flac files
   from the given offset values of the seekpoints
5. @wasm-audio-decoders/flac for decoding the files in parallel

In this code I have so far we miss samples between windows like A→B here B→C:

    /**
     * All info for a Range-Request A→B
     * (incl. FUDGE, and EOF edge case).
     *
     * usage:
     * const {byteStart, byteEnd, wantSamples} =
     *       SeekPoints.rangeFor(aIdx, bIdx, seekTable);
     *
     * const bytes = await fetchRange(url, byteStart, byteEnd);
     *
     * @param seekTable   full Seektable
     * @param indexA      Start index  (inclusive)
     * @param indexB      End index    (inclusive!)
     * @param fudgeBytes  optional Buffer (default 64 KiB)
     * @returns {
     *   byteStart: number,
     *   byteEnd:   number|null,   // null → to EOF
     *   wantSamples: number       // sample_B – sample_A
     * }
     */
    static rangeFor(indexA, indexB, seekTable, fudgeBytes = 64 * 1024) {
        if (indexA < 0 || indexB >= seekTable.length || indexA >= indexB)
            throw new Error('invalid indices');

        const byteStart = seekTable[indexA].streamOffset;
        const nextOff = seekTable[indexB + 1]?.streamOffset ?? null;
        const byteEnd = nextOff ? nextOff + fudgeBytes - 1 : null;

        const wantSamples = seekTable[indexB].sample - seekTable[indexA].sample;
        return {byteStart, byteEnd, wantSamples};
    }

    /**
     * Fetches a byte range and returns it as an Uint8Array.
     * @param {string} url
     * @param {number} offset
     * @param {number} length
     * @returns {Promise<Uint8Array>}
     */
    async #fetchRange(url, offset, length) {
        const headers = {};
        // Open‑ended request if length is not positive
        if (length === undefined || length <= 0) {
            headers.Range = `bytes=${offset}-`;
        } else {
            headers.Range = `bytes=${offset}-${offset + length - 1}`;
        }
        const response = await fetch(url, {headers});
        if (!response.ok && response.status !== 206) {
            throw new Error(`[fetchRange] Unexpected HTTP response ${response.status} for ${url}`);
        }
        const arrayBuffer = await response.arrayBuffer();
        return new Uint8Array(arrayBuffer);
    }

    /**
     * Fetch + decode PCM between two given seekpoints in a given FLAC using SeekPoints.rangeFor().
     * Ensures last frame completeness via nextSeekpoint + FUDGE and trims
     * the decoded PCM to exactly sample_B − sample_A.
     *
     * This function is an internal step that will be called multiple times
     * until the buffer is filled.
     *
     * @returns {Promise<Float32Array>[]}
     */
    #fetchAndDecodeFlacFileFromSeekPointAtoB(
        url,
        seekTable,
        firstIndex,
        lastIndex,
        sampleRate,
        expectedChannels
    ) {
        console.log("[#fetchAndDecodeFlacFileFromSeekPointAtoB", firstIndex, lastIndex);
        const {byteStart, byteEnd, wantSamples} = SeekPoints.rangeFor(firstIndex, lastIndex, seekTable);
        const byteLength = byteEnd !== null ? (byteEnd - byteStart) : undefined;

        return this.#fetchRange(url, byteStart, byteLength)
            .then(rawFlacData => {
                const decoder = new FLACDecoderWebWorker();
                return decoder.ready
                    .then(() => decoder.decode(rawFlacData))
                    .finally(() => decoder.free());
            })
            .then(decoded => {
                if (!decoded.channelData || decoded.channelData.length !== expectedChannels) {
                    throw new Error(`[decode] Channel count mismatch: expected ${expectedChannels}, got ${decoded.channelData?.length}`);
                }
                // Per-channel trimming
                return decoded.channelData.map((channelArray, channel) => {
                    if (channelArray.length === wantSamples) {
                        return channelArray;
                    } else if (channelArray.length > wantSamples) {
                        // Overlong, slice
                        return channelArray.subarray(0, wantSamples);
                    } else {
                        throw new Error(`[decode] File ${url} channel ${channel} decoded ${channelArray.length} samples, expected ${wantSamples}`);
                    }
                });
            });
    }

// ---- invocation ----

const fetchAndDecodeAllFlacFilesPromises = this.fileList.map((url, fileIndex) => {
    const meta = this.flacMetaData[fileIndex];
    const seekTable = meta.seekTable;
    return this.#fetchAndDecodeFlacFileFromSeekPointAtoB(
        url,
        seekTable,
        this.#windowSeekpointFirstIndex,
        this.#windowSeekpointLastIndex,
        this.#trackSampleRate,
        meta.streamInfo.channels
    );
});
const decodedPcmFromAllFlacFiles = await Promise.all(fetchAndDecodeAllFlacFilesPromises);

decodedPcmFromAllFlacFiles.flat().forEach((channelData, chIndex) => {
    const first64 = channelData.slice(0, 64);
    const last64 = channelData.slice(-64);
    if (chIndex === 0) {
        console.log(`[Channel ${chIndex}] First 64 samples:`, first64);
        console.log(`[Channel ${chIndex}] Last 64 samples:`, last64);
    }
});
like image 617
Gabriel Wolf Avatar asked Oct 24 '25 03:10

Gabriel Wolf


2 Answers

You already have a SEEKTABLE per file that gives you (seekSample, seekOffsetByte) pairs (e.g. produced by metaflac --add-seekpoint).

Files share the same sample rate and sample indices are comparable across files.

You know how many samples you need from each file for your stitched segment (can be segmentLengthSamples).

Decoder outputs PCM in Float32 frames (if yours outputs Int16/Int32 you’ll convert accordingly).

@wasm-audio-decoders/flac or your decoder exposes a way to decode a chunk (ArrayBuffer) and yields decoded PCM blocks with sample counts and channel count.

Constants to Tune

const SAFETY_BYTES = 64 * 1024; // 64KB, safe margin to include frame header and metadata
const MAX_FETCH_SIZE = 2 * 1024 * 1024; // 2MB chunk fetch limit (tune as needed)

Full implementation

 
// Utility: HTTP range fetch -> returns ArrayBuffer
async function fetchRange(url, start = 0, end = null) {
  const headers = {};
  if (end == null) headers['Range'] = `bytes=${start}-`;
  else headers['Range'] = `bytes=${start}-${end}`;
  const r = await fetch(url, { headers });
  if (!r.ok && r.status !== 206) {
    throw new Error(`Range request failed: ${r.status} ${r.statusText}`);
  }
  return await r.arrayBuffer();
}

// ----------------------------------------------------------------------------
// Decoder adapter (plug your decoder here)
// ----------------------------------------------------------------------------
// Example adapter interface assumed by the rest of the code:
//  - createDecoder(): returns a decoder object
//  - decoder.pushBytes(u8array): feed raw flac bytes (can be incremental)
//  - decoder.decodeAvailable(): returns array of decoded PCM blocks:
//      [{ pcm: Float32Array, channels: <n>, samples: <numSamples>, sampleRate }]
//  - decoder.isEnded() or decoder.needsMoreData()
//  - decoder.finish() to flush and get remaining output
//
// Implement these methods for @wasm-audio-decoders/flac (or adapt below).
// ----------------------------------------------------------------------------

function createWasmFlacDecoderAdapter() {
  // *** Replace this with real @wasm-audio-decoders/flac usage ***
  // PSEUDO-code/placeholder:
  const internal = {
    // example internal state for a hypothetical decoder library
    // `lib` would be your actual wasm/flac instance
    lib: null,
  };

  return {
    async init() {
      // e.g. await FlacDecoderModule.ready();
      // internal.lib = new FlacDecoderModule.Decoder();
      // Return when ready
    },
    pushBytes(u8) {
      // Feed bytes into the decoder input buffer
      // e.g. internal.lib.push(u8);
      // Return nothing
    },
    decodeAvailable() {
      // Pull out decoded PCM frames if available.
      // Must return an array of objects:
      // { pcm: Float32Array, channels: nChannels, samples: nSamples, sampleRate }
      // For example:
      // return internal.lib.getDecodedBlocks().map(b => ({
      //   pcm: b.float32Interleaved,
      //   channels: b.channels,
      //   samples: b.samples,
      //   sampleRate: b.sampleRate
      // }));
      return [];
    },
    finish() {
      // Flush decoder and return remaining decoded blocks
      return [];
    }
  };
}

// ----------------------------------------------------------------------------
// Core routine: fetch, decode, trim to desired sample range
// ----------------------------------------------------------------------------
/**
 * Fetches from a given seekOffset (byte offset) with safety margin,
 * decodes until we have enough PCM to cover [targetSample, targetSample+neededSamples),
 * and returns trimmed PCM (Float32 interleaved) for that single file.
 *
 * @param {string} url - url of .flac file
 * @param {number} seekSample - sample index we want to start at (absolute sample)
 * @param {number} seekOffset - byte offset from seeptable for seekSample (frame start offset)
 * @param {number} neededSamples - how many samples we require starting at seekSample
 * @param {object} decoderFactory - result of createWasmFlacDecoderAdapter()
 * @param {number} sampleRate - sample rate (for sanity checks)
 * @returns {Promise<{pcm: Float32Array, channels: number, sampleRate: number}>}
 */
async function fetchDecodeTrim(url, seekSample, seekOffset, neededSamples, decoderFactory, sampleRate) {
  const fetchStart = Math.max(0, seekOffset - SAFETY_BYTES);
  // we could fetch just enough bytes until we decode to neededSamples but for simplicity fetch a chunk
  const fetchEnd = fetchStart + MAX_FETCH_SIZE - 1;

  const decoder = decoderFactory;
  if (decoder.init) await decoder.init();

  // Fetch range
  const fetched = await fetchRange(url, fetchStart, fetchEnd);
  const u8 = new Uint8Array(fetched);
  decoder.pushBytes(u8);

  // decode loop (pull all available decoded blocks)
  let decodedBlocks = [];
  decodedBlocks.push(...decoder.decodeAvailable());

  // if not enough decoded samples after initial chunk, fetch more until satisfied
  let totalDecodedSamplesSoFar = decodedBlocks.reduce((s, b) => s + b.samples, 0);
  while (totalDecodedSamplesSoFar < (neededSamples + (seekSample - seekSample /*0*/)) ) {
    // If the decoder can tell it needs more data, request more bytes after fetchEnd
    // Here we fetch the next chunk (incremental)
    const nextStart = fetchEnd + 1;
    const nextEnd = nextStart + MAX_FETCH_SIZE - 1;
    try {
      const nextBuf = await fetchRange(url, nextStart, nextEnd);
      const nextU8 = new Uint8Array(nextBuf);
      decoder.pushBytes(nextU8);
      decodedBlocks.push(...decoder.decodeAvailable());
      totalDecodedSamplesSoFar = decodedBlocks.reduce((s, b) => s + b.samples, 0);
    } catch (err) {
      // if we hit EOF or range request fails, break
      console.warn('Range fetch next chunk failed or EOF', err);
      break;
    }
  }

  // finally flush decoder
  decodedBlocks.push(...(decoder.finish ? decoder.finish() : []));

  if (decodedBlocks.length === 0) {
    throw new Error('No decoded blocks available');
  }

  // All decoded blocks are assumed interleaved Float32 arrays (N channels interleaved)
  // Concatenate decoded blocks into one interleaved buffer
  const channels = decodedBlocks[0].channels;
  const sr = decodedBlocks[0].sampleRate || sampleRate;
  const totalSamples = decodedBlocks.reduce((acc, b) => acc + b.samples, 0);

  // Concatenate interleaved PCM
  const interleaved = new Float32Array(totalSamples * channels);
  let writePtr = 0;
  for (const block of decodedBlocks) {
    interleaved.set(block.pcm, writePtr);
    writePtr += block.pcm.length;
  }

  // We must know the absolute sample index of the first decoded sample.
  // If we fetched from seekOffset which corresponds to seekSample (frame start),
  // and there were no encoder-introduced earlier samples, then
  // firstDecodedSampleIndex = seekSample - (samplesInFramesBeforeTarget)
  //
  // In practice: when you start decoding at a frame whose first sample is <= seekSample,
  // the first decoded sample's absolute index is the frame's sample index.
  // To be safe, we compute the "decodedFirstSample" relative to the seekpoint:
  //
  // If you fetched from seekOffset corresponding to seekSample, and seekOffset truly points
  // to a frame that starts exactly at seekSample, then decodedFirstSample = seekSample.
  // But if we fetched earlier than the frame (due to safety bytes), the decoder will
  // decode the earlier frame that starts at some sample <= seekSample.
  //
  // So the safe approach: use the SEEKTABLE you generated to know the exact frame sample
  // number for 'seekOffset'. This should be passed in as `seekSample`.
  //
  // We then compute:
  const decodedFirstSample = seekSample; // assume seekOffset's frame first sample equals seekSample

  // Calculate relative indices in interleaved buffer
  // startSampleRelative = seekSample - decodedFirstSample = 0 if assumption holds
  const startSampleRelative = 0; // if your SEEKTABLE offset points exactly to seekSample
  const startFrameIndex = startSampleRelative * channels;
  const samplesToTake = Math.min(neededSamples, totalSamples - startSampleRelative);

  if (samplesToTake <= 0) {
    throw new Error('Not enough decoded samples to fulfill request');
  }

  // Slice out the required portion
  const resultInterleaved = interleaved.subarray(startFrameIndex, startFrameIndex + samplesToTake * channels);

  return {
    pcm: resultInterleaved.slice(0), // copy
    channels,
    sampleRate: sr,
    startSample: seekSample,
    samplesReturned: samplesToTake
  };
}

// ----------------------------------------------------------------------------
// Stitching multiple files (each with N channels) into one big multichannel
// Input: array of per-file results; each has `pcm` interleaved, `channels`, `samplesReturned` and `startSample`.
// Output: interleaved PCM with channels = sum of channels
// ----------------------------------------------------------------------------
function stitchFiles(fileResults, segmentStartSample, segmentLengthSamples) {
  // Validate sample alignment
  for (const r of fileResults) {
    if (r.startSample !== segmentStartSample) {
      console.warn('Start sample mismatch', r.startSample, 'expected', segmentStartSample);
      // In case of mismatch, you'd normally align by padding/trimming
    }
  }

  const totalChannels = fileResults.reduce((acc, r) => acc + r.channels, 0);
  const out = new Float32Array(segmentLengthSamples * totalChannels);

  // For each sample index s in [0, segmentLengthSamples)
  //   for each file f:
  //     for each c in file.channels:
  //       out[ s*totalChannels + chPtr ++ ] = file_pcm[ s*file.channels + c ]
  //
  // We'll do this per file for performance
  let chOffset = 0;
  for (const file of fileResults) {
    const fileChannels = file.channels;
    const fileSamples = file.samplesReturned;
    const pcm = file.pcm; // interleaved length fileSamples*fileChannels
    // If fileSamples < segmentLengthSamples, pad with zeros
    const minSamples = Math.min(segmentLengthSamples, fileSamples);

    for (let s = 0; s < minSamples; s++) {
      const dstBase = s * totalChannels + chOffset;
      const srcBase = s * fileChannels;
      // copy channels
      for (let c = 0; c < fileChannels; c++) {
        out[dstBase + c] = pcm[srcBase + c];
      }
    }
    if (fileSamples < segmentLengthSamples) {
      // zero pad remaining frames for this file channels
      for (let s = fileSamples; s < segmentLengthSamples; s++) {
        const dstBase = s * totalChannels + chOffset;
        for (let c = 0; c < fileChannels; c++) {
          out[dstBase + c] = 0;
        }
      }
    }
    chOffset += fileChannels;
  }

  return { pcm: out, channels: totalChannels, sampleRate: fileResults[0].sampleRate };
}

// ----------------------------------------------------------------------------
// Example orchestration: fetch 3 flac files, align to same seekSample and stitch them
// ----------------------------------------------------------------------------
async function fetchAndStitchExample() {
  // Example input: you must supply these from your SEEKTABLE metadata
  const files = [
    { url: 'https://example.com/file1.flac', channels: 4, seekSample: 123456, seekOffset: 987654 },
    { url: 'https://example.com/file2.flac', channels: 5, seekSample: 123456, seekOffset: 876543 },
    { url: 'https://example.com/file3.flac', channels: 7, seekSample: 123456, seekOffset: 765432 },
  ];
  const segmentStart = 123456; // absolute sample where segment must begin
  const segmentLength = 48000; // e.g. 1 second at 48kHz

  const decoderFactory = createWasmFlacDecoderAdapter(); // adapt to real decoder

  const perFileResults = [];
  for (const f of files) {
    // Request enough samples for segmentLength
    const res = await fetchDecodeTrim(f.url, f.seekSample, f.seekOffset, segmentLength, decoderFactory, 48000);
    perFileResults.push(res);
  }

  const stitched = stitchFiles(perFileResults, segmentStart, segmentLength);
  console.log('stitched channels', stitched.channels, 'samples', segmentLength);
  // stitched.pcm is interleaved Float32Array of length segmentLength * channels
  return stitched;
}
like image 138
SAYAN MAITRA Avatar answered Oct 27 '25 03:10

SAYAN MAITRA


Yes, sample accurate decoding is possible.

TLDR; A seekpoints frameOffset is not an absolute file offset in bytes, but a relative offset from the start of the first audio frame.

You have to add the size of the header to the frameOffset in order to get correct values to fetch from.
Unfortunately every FLAC header has a different size. If we parse the metadata from the header, we have to keep track of the size, and add another field to our meta information that contains the desired absolute byte offset.

Here is an example for parsing the metadata. Note: Please ignore the ReplayGain Vorbis comment parsing, in my case a necessity, but not related to the question.

function parseFlacMetadata(buffer) {
    const dataView = new DataView(buffer);
    let pointer = 4; // skip 'fLaC'
    let streamInfo = {}, seekTable = [], replayGain = {};
    let firstFrameOffset = null;

    while (pointer < dataView.byteLength) {
        const header = dataView.getUint8(pointer);
        const isLast = (header & 0x80) !== 0;
        const type = header & 0x7F;
        const length = ((dataView.getUint8(pointer + 1) << 16) |
            (dataView.getUint8(pointer + 2) << 8) |
            dataView.getUint8(pointer + 3));
        pointer += 4;

        if (type === 0) { // STREAMINFO
            const minBlockSize = dataView.getUint16(pointer);
            const maxBlockSize = dataView.getUint16(pointer + 2);
            const minFrameSize = (dataView.getUint8(pointer + 4) << 16) |
                (dataView.getUint8(pointer + 5) << 8) |
                dataView.getUint8(pointer + 6);
            const maxFrameSize = (dataView.getUint8(pointer + 7) << 16) |
                (dataView.getUint8(pointer + 8) << 8) |
                dataView.getUint8(pointer + 9);
            const sr_hi = dataView.getUint8(pointer + 10);
            const sr_mid = dataView.getUint8(pointer + 11);
            const sr_lo = dataView.getUint8(pointer + 12);
            const bits1 = dataView.getUint8(pointer + 13);
            const bits2 = dataView.getUint8(pointer + 14);
            const bits3 = dataView.getUint8(pointer + 15);
            const bits4 = dataView.getUint8(pointer + 16);
            const bits5 = dataView.getUint8(pointer + 17);

            const sampleRate = ((sr_hi << 12) | (sr_mid << 4) | (sr_lo >> 4));
            const channels = ((sr_lo & 0x0E) >> 1) + 1;
            const bitsPerSample = (((sr_lo & 0x01) << 4) | ((bits1 & 0xF0) >> 4)) + 1;

            const totalSamples = (((BigInt(bits1 & 0x0F) << 32n) |
                (BigInt(bits2) << 24n) |
                (BigInt(bits3) << 16n) |
                (BigInt(bits4) << 8n) |
                BigInt(bits5)));

            streamInfo = {
                sampleRate,
                totalSamples: Number(totalSamples),
                channels,
                bitsPerSample
            };
        }

        if (type === 3) { // SEEKTABLE
            const count = Math.floor(length / 18);
            for (let i = 0; i < count; i++) {
                const offset = pointer + i * 18;

                if (offset + 18 > buffer.byteLength) break;

                const sample = Number(dataView.getBigUint64(offset));
                const frameOffset = Number(dataView.getBigUint64(offset + 8));
                const frameSamples = dataView.getUint16(offset + 16);

                if (sample !== 0xFFFFFFFFFFFFFFFF) {
                    seekTable.push({
                        sample,
                        frameOffset,    // original relative offset
                        frameSamples,
                        fileOffset: null // fill in later
                    });
                }
            }
        }

        if (type === 4) { // VORBIS_COMMENT
            const view = new DataView(buffer, pointer, length);
            let offset = 0;
            const vendorLength = view.getUint32(offset, true);
            offset += 4 + vendorLength;

            const commentsCount = view.getUint32(offset, true);
            offset += 4;

            for (let i = 0; i < commentsCount; i++) {
                const len = view.getUint32(offset, true);
                offset += 4;

                const strBytes = new Uint8Array(buffer, pointer + offset, len);
                const str = new TextDecoder().decode(strBytes);
                offset += len;

                const [key, val] = str.split('=');
                if (key && val) {
                    replayGain[key.toUpperCase()] = val;
                }
            }
        }

        pointer += length;
        if (isLast) {
            firstFrameOffset = pointer; // <-- start of first audio frame
            break;
        }
    }

    // Post-process: add file offsets
    if (firstFrameOffset !== null) {
        for (const seekPoint of seekTable) {
            seekPoint.fileOffset = firstFrameOffset + seekPoint.frameOffset;
        }
    }

    return {
        streamInfo,
        seekTable,
        replayGain,
        fileSize: buffer.byteLength,
        firstFrameOffset
    };
}

The input of parseFlacMetadata must be a fetch of the first 128 KiB (enough for a SEEKTABLE of about 2530 seekpoints. If you have a cover image or other metadata in your FLAC metadata, you have to increase the prefetch Range: 'bytes=0-131071' to a larger value).

const response = await fetch(url, {headers: {Range: 'bytes=0-131071'}});
const arrayBuffer = await response.arrayBuffer();
let meta = parseFlacMetadata(arrayBuffer);

console.log(meta);

Now you have to make use of the calculated absolute file offset in the stated rangeFor function of the question ...

const byteStart = seekTable[indexA].fileOffset;            // instead of frameOffset
const nextOff = seekTable[indexB + 1]?.fileOffset ?? null; // instead of frameOffset

... the rest of the code in the question remains the same. And you’re good to go! :)

like image 20
Gabriel Wolf Avatar answered Oct 27 '25 03:10

Gabriel Wolf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!