I've read that pipes need to have a limited capacity. But I don't understand why. What happens if a process writes into a pipe without a limit?
If the pipe buffer is full before all bytes are written, WriteFile does not return until another process or thread uses ReadFile to make more buffer space available.
Since Linux 2.6. 11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a system with a page size of 4096 bytes). Since Linux 2.6. 35, the default pipe capacity is 16 pages, but the capacity can be queried and set using the fcntl(2) F_GETPIPE_SZ and F_SETPIPE_SZ operations.
Plumbing is a piping system with which most people are familiar, as it constitutes the form of fluid transportation that is used to provide potable water and fuels to their homes and businesses. Plumbing pipes also remove waste in the form of sewage, and allow venting of sewage gases to the outdoors.
The number of file descriptors that can be open at a given time is limited. If you keep opening pipes and not closing them pretty soon you'll run out of FDs and can't open anything anymore: not pipes, not files, not sockets, ...
It's due to buffering. Pipes are not "magical", pipes do not ensure all processes process each individual byte or character in lockstep. Instead pipes buffer inter-process output and then pass the buffer along. And this buffer size limit is what you're referring to. In many Linux distros and in macOS the buffer size is 64KiB.
Imagine there's a process that outputs 1GB of data every second to stdout - and it's piped to another process that can only process 100 bytes of data every minute on stdin - consider that those gigabytes of data have to go somewhere. If there was an infinitely sized buffer than you would quickly fill up the memory-space of whatever OS component owns the pipe and then start paging out to disk - and then your pagefile on disk would fill up - and that's not good.
By having maximum buffer sizes, the output process will be notified when it's filled the buffer and it's free to handle that event however is appropriate (e.g. by pausing output if it's a random number generator, by dropping data if it's a network monitor, by crashing, etc).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With