I am writing an app to manipulate audio where i need to convert a file (wav, MP3, etc) to raw data (samples are presented as float) at the first place.
I use ffmpeg in cmd:
ffmpeg -i test.wav -f s16le -acodec pcm_s16le output.dat
How are samples represented in the output.dat file? I know one sample needs two bytes under S16, and dual channel means it is stored as L1 R1 L2 R2 ... But does this file come with a frame presentation or all the byte in the dat file are sample values? The converted files' size of test.wav via two methods is not identical. One is via libav using example code on ffmpeg website, another one is what mentioned above, directly using ffmpeg.exe in cmd, the former method give me a slightly smaller file size.I am confused when i find someone says pcm use a frame presentation (2048 samples a frame). 
I actually do not need any code but hope someone can explain raw pcm format in detail.
Thanks a lot
A digital audio signal may be stored or transmitted. Digital audio can be stored on a CD, a digital audio player, a hard drive, a USB flash drive, or any other digital data storage device. The digital signal may be altered through digital signal processing, where it may be filtered or have effects applied.
The only difference between PCM and RAW audio output is self-decoding and non-decoding. RAW supports source code output, and PCM can decode and play by itself.
Uncompressed audio format Although LPCM can be stored on a computer as a raw audio format, it is usually stored in a . wav file on Windows or in a . aiff file on macOS.
Pulse Code Modulation audio is a digital recording of analog audio used in a range of technologies from telephones to Blu-ray discs. PCM works by taking analog signal amplitude samples at regular intervals several thousand times per second.
Starting with a stereo wav file with a bit depth of 16 bits at 44,100 kHz sample rate you have a standard CD quality audio file ... issue this on command line to display such stats on a file
ffprobe Cesária_Évora.wav
typical output
  Duration: 00:00:21.51, bitrate: 1411 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
to create a PCM file from the wav issue
ffmpeg -i Cesária_Évora.wav -f s16le -acodec pcm_s16le cesaria.dat
be aware a wav file is simply a 44 byte header followed by payload which is the raw audio curve in PCM format ... this PCM file is strictly L1 R1 L2 R2 nothing more nothing less ... any notion of frames is an abstraction of how we parse the data with no bits dedicated to implement a frame (like start/end markers) ... to write code to manipulate PCM data keep in mind your bit depth as well as whether your file has little endian or big endian byte structure ... whenever your file has a bit depth of 8 bits then you can safely ignore endianness since you will never need to shift bytes however since above file has bit depth of 16 bits this means each point of the audio curve is represented by a single 16 bit number per channel ( stereo is two channel, mono one channel )
when reading such a file this 16 bit number is stored across two bytes ... if little endian as you read the bytes the left most byte ( first encountered in your loop as you iterate across the file ) is the littlest byte followed by the next more significant byte meaning
L1 R1 L2 R2 
below we indicate the stereo representation of two 16 bit points on the audio curve
Llittle1 Lbig1 Rlittle1 Rbig1 Llittle2 Lbig2 Rlittle2 Rbig2
when we speak of individual bytes used to store those two points ... note above shows 8 bytes ... similarly if we had a bit depth of 24 bytes it would be the following for one raw audio sample on one channel
Llittle1 Lbigger1 Lbiggest1 Rlittle1 Rbigger1 Rbiggest1  
so conceptually when reading a little endian file of bit depth 16 bits here is how you parse the PCM for one channel for one point on the raw audio curve
Llittle1 Lbig1
now to generate a single value L1 you conceptually do this
L1 = ( Lbig1 << shift 8 bits to left ) + Llittle1
Not sure if this is the level of abstraction you where looking for however it is a stepping stone to nailing digital audio
Super helpful tool Audacity permits you to import a raw audio file in PCM format as we generated above cesaria.dat ... Audacity -> File -> Import -> Raw Data -> choose cesaria.dat ->

-f s16le produces a raw samples dump with no header/trailer or any  metadata. So, it is simply L1 R1 C1 L2 R2 C2... where L R C represent 3 channels.
When ffmpeg reads such a file, it will read and frame 1024 samples from each channel at a time, unless sampling rate/25 is less than 1024, in which case, it will read and packetize those many samples e.g. for a stream of 16000 Hz, sampling rate/25 = 640, which is less than 1024. So, ffmpeg will packetize 640x2 = 1280 samples for such a stereo stream.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With