What is the use of buffering in python's built-in open() function?

Tags:

python-2.7

Python Documentation : https://docs.python.org/2/library/functions.html#open

open(name[, mode[, buffering]])

The above documentation says "The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default.If omitted, the system default is used.".
When I use

filedata = open(file.txt,"r",0)

filedata = open(file.txt,"r",1)

filedata = open(file.txt,"r",2)

filedata = open(file.txt,"r",-1)

filedata = open(file.txt,"r")

The output has no change. Each line shown above prints at same speed.
output:

Mr. Bean is a British television programme series of fifteen 25-

minute episodes written by Robin Driscoll and starring Rowan Atkinson as

the title character. Different episodes were also written by Robin

Driscoll and Richard Curtis, and one by Ben Elton. Thirteen of the

episodes were broadcast on ITV, from the pilot on 1 January 1990, until

"Goodnight Mr. Bean" on 31 October 1995. A clip show, "The Best Bits of

Mr. Bean", was broadcast on 15 December 1995, and one episode, "Hair by

Mr. Bean of London", was not broadcast until 2006 on Nickelodeon.

Then how the buffering parameter in the open() function is useful? What value

of that buffering parameter is best to use?

325

asked Apr 18 '15 03:04

Srivishnu

1 Answers

Enabling buffering means that you're not directly interfacing with the OS's representation of a file, or its file system API. Instead, a chunk of data is read from the raw OS filestream into a buffer until it is consumed, at which point more data is fetched into the buffer. In terms of the objects you get, you'll get a BufferedIOBase object wrapping an underlying RawIOBase (which represents the raw file stream).

What is the benefit of this? Well interfacing with the raw stream might have high latency, because the operating system has to fool around with physical objects like the hard disk, and this may not be acceptable in all cases. Let's say you want to read three letters from a file every 5ms and your file is on a crusty old hard disk, or even a network file system. Instead of trying to read from the raw filestream every 5ms, it is better to load a bunch of bytes from the file into a buffer in memory, then consume it at will.

What size of buffer you choose will depend on how you're consuming the data. For the example above, a buffer size of 1 char would be awful, 3 chars would be alright, and any large multiple of 3 chars that doesn't cause a noticeable delay for your users would be ideal.

169

answered Oct 07 '22 19:10

Asad Saeeduddin

Related questions
                            
                                timeit and its default_timer completely disagree
                            
                                Subclassing Python dictionary to override __setitem__
                            
                                Why was PyPI called the cheese shop?
                            
                                How to reference python package when filename contains a period
                            
                                Is it safe to use sys.platform=='win32' check on 64-bit Python?
                            
                                How to get text in QlineEdit when QpushButton is pressed in a string?
                            
                                Keep plotting window open in Matplotlib
                            
                                Is it bad form to call a classmethod as a method from an instance?
                            
                                Using flask inside class
                            
                                When are parentheses required around a tuple?
                            
                                How do I create a date picker in tkinter?
                            
                                Colour chart for Tkinter and Tix
                            
                                How to define free-variable in python?
                            
                                What is a Python bytestring?
                            
                                Python assignment destructuring
                            
                                R summary() equivalent in numpy
                            
                                Is there anything like VirtualEnv for Java?
                            
                                python encoding utf-8
                            
                                Should I use Pylons or Pyramid?
                            
                                What makes a user-defined class unhashable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With