Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the use of buffering in python's built-in open() function?

Python Documentation : https://docs.python.org/2/library/functions.html#open

open(name[, mode[, buffering]])   

The above documentation says "The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default.If omitted, the system default is used.".
When I use

filedata = open(file.txt,"r",0)   

or

filedata = open(file.txt,"r",1)   

or

filedata = open(file.txt,"r",2) 

or

filedata = open(file.txt,"r",-1)  

or

filedata = open(file.txt,"r") 

The output has no change. Each line shown above prints at same speed.
output:

Mr. Bean is a British television programme series of fifteen 25-

minute episodes written by Robin Driscoll and starring Rowan Atkinson as

the title character. Different episodes were also written by Robin

Driscoll and Richard Curtis, and one by Ben Elton. Thirteen of the

episodes were broadcast on ITV, from the pilot on 1 January 1990, until

"Goodnight Mr. Bean" on 31 October 1995. A clip show, "The Best Bits of

Mr. Bean", was broadcast on 15 December 1995, and one episode, "Hair by

Mr. Bean of London", was not broadcast until 2006 on Nickelodeon.

Then how the buffering parameter in the open() function is useful? What value

of that buffering parameter is best to use?

like image 325
Srivishnu Avatar asked Apr 18 '15 03:04

Srivishnu


People also ask

What is the use of file buffering in Python open built in function?

Buffer structures (or simply “buffers”) are useful as a way to expose the binary data from another object to the Python programmer. They can also be used as a zero-copy slicing mechanism. Using their ability to reference a block of memory, it is possible to expose any data to the Python programmer quite easily.

Why does Python buffer data?

In Python, the buffer type object is used to show the internal data of a given object in a byte-oriented format. Python's main use of buffers is storing and manipulating huge data arrays and processing them without creating copies.

What is Python output buffering?

Summary: Python output buffering is the process of storing the output of your code in buffer memory. Once the buffer is full, the output gets displayed on the standard output screen.

What is buffer protocol in Python?

¶ The Python buffer protocol, also known in the community as PEP 3118, is a framework in which Python objects can expose raw byte arrays to other Python objects. This can be extremely useful for scientific computing, where we often use packages such as NumPy to efficiently store and manipulate large arrays of data.


1 Answers

Enabling buffering means that you're not directly interfacing with the OS's representation of a file, or its file system API. Instead, a chunk of data is read from the raw OS filestream into a buffer until it is consumed, at which point more data is fetched into the buffer. In terms of the objects you get, you'll get a BufferedIOBase object wrapping an underlying RawIOBase (which represents the raw file stream).

What is the benefit of this? Well interfacing with the raw stream might have high latency, because the operating system has to fool around with physical objects like the hard disk, and this may not be acceptable in all cases. Let's say you want to read three letters from a file every 5ms and your file is on a crusty old hard disk, or even a network file system. Instead of trying to read from the raw filestream every 5ms, it is better to load a bunch of bytes from the file into a buffer in memory, then consume it at will.

What size of buffer you choose will depend on how you're consuming the data. For the example above, a buffer size of 1 char would be awful, 3 chars would be alright, and any large multiple of 3 chars that doesn't cause a noticeable delay for your users would be ideal.

like image 169
Asad Saeeduddin Avatar answered Oct 07 '22 19:10

Asad Saeeduddin