I've been wondering for a while now, how exactly does file streaming work? With file streaming, I mean accessing parts of a file without loading the whole file into memory.
I (believe to) know that the C++ classes (i|o)fstream
do exactly that, but how is it implemented? Is it possible to implement file streaming yourself?
How does it work at the lowest C / C++ (or any language that supports file streaming) level? Do the C functions fopen
, fclose
, fread
and the FILE*
pointer already take care of streaming (i.e., not loading the whole file into memory)? If not, how would you read directly from the harddrive and is there such a facility alread implemented in C / C++?
Any links, hints, pointers in the right direction would already be very helpful. I've googled, but it seems Google doesn't quite understand what I'm after...
Ninja-Edit: If anybody knows anything about how to this works at assembly / machine code level and if it's possible to implement this yourself or if you have to rely on system calls, that would be awesome. :) Not a requirement for an answer, though a link in the right direction would be nice.
What is streaming? | How video streaming works Streaming is a method of viewing video or listening to audio content without actually downloading the media files. Streaming performance can be improved, and buffering time reduced, if the owner of the files uses a CDN.
With streaming, the media file being played on the client device is stored remotely, and is transmitted a few seconds at a time over the Internet. What is the difference between streaming and downloading? Streaming is real-time, and it's more efficient than downloading media files.
Streaming video and audio from the internet isn’t new; it just feels new because it’s finally convenient. Watching a video or playing a song from a website happened bit by bit used to be an annoying and time-consuming affair.
When you stream a movie or a song, your computer downloads and decodes itty-bitty pieces of a media file in real-time. If you have an unusually fast internet connection, then the file may be fully downloaded before you’re finished watching or listening to it, which is why a stream will sometimes go on for a while even if the internet cuts out.
At the lowest level (at least for userland code), you'll use system calls. On UNIX-like platforms, these include:
open
close
read
write
lseek
...and others. These work by passing around these things called file descriptors. File descriptors are just opaque integers. Inside the operating system, each process has a file descriptor table, containing all of the file descriptors and relevant information, such as which file it is, what kind of file it is, etc.
There are also Windows API calls similar to system calls on UNIX:
CreateFile
CloseHandle
ReadFile
/ReadFileEx
WriteFile
/WriteFileEx
SetFilePointer
/SetFilePointerEx
Windows passes around HANDLE
s, which are similar to file descriptors, but are, I believe, a little less flexible. (for example, on UNIX, file descriptors can not only represent files, but also sockets, pipes, and other things)
The C standard library functions fopen
, fclose
, fread
, fwrite
, and fseek
are merely wrappers around these system calls.
When you open a file, usually none of the file's contents is read into memory. When you use fread
or read
, you tell the operating system to read a particular number of bytes into a buffer. This particular number of bytes can be, but does not have to be, the length of the file. As such, you can read only part of a file into memory, if desired.
You asked how this works at the machine code level. I can only really explain how this works on Linux and the Intel 32-bit architecture. When you use a system call, some of the arguments are placed into registers. After the arguments are placed into the registers, interrupt 0x80
is raised. So, for example, to read one kilobyte from stdin
(file descriptor 0) to the address 0xDEADBEEF
, you might use this assembly code:
mov eax, 0x03 ; system call number (read = 0x03)
mov ebx, 0 ; file descriptor (stdin = 0)
mov ecx, 0xDEADBEEF ; buffer address
mov edx, 1024 ; number of bytes to read
int 0x80 ; Linux system call interrupt
int 0x80
raises a software interrupt that the operating system usually will have registered in the interrupt vector table or interrupt descriptor table. Anyway, the processor will jump to a particular place in memory. Once there, usually the operating system will enter kernel mode (if necessary) and then do the equivalent of C's switch
on eax
. From there, it will jump into the implementation for read
. In read
, it will usually read some metadata about the descriptor from the calling process's file descriptor table. Once it has all the data it needs, it does its stuff, then returns back to the user code.
To "do its stuff", let's assume it's reading from disk, and not a pipe or stdin
or some other non-physical place. Let's also assume it's reading from the primary hard disk. Also, let's assume the operating system can still access the BIOS interrupts.
To access the file, it needs to do a bunch of filesystem things. For example, traversing the directory tree to find where the actual file is. I'm not going to cover this, much, since I bet you can guess.
The interesting part is reading data from the disk, whether it be filesystem metadata, file contents, or something else. First, you get a logical block address (LBA). An LBA is just an index of a block of data on the disk. Each block is usually 512 bytes (although this figure may be dated). Still assuming we have access to the BIOS and the OS uses it, it then will convert the LBA to CHS notation. CHS (Cylinder-Head-Sector) notation is another way to reference parts of the hard drive. It used to correspond to physical concepts, but nowadays, it's outdated, but almost every BIOS supports it. From there, the OS will stuff data into registers and trigger interrupt 0x13
, the BIOS's disk-reading interrupt.
That's the lowest level I can explain, and I'm sure the part after I assumed the operating system used the BIOS is outdated. Everything before that is how it still works, though, I believe, if not at a simplified level.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With