I'm trying to decide which one to use when I need to acquire lines of input from STDIN, so I wonder how I need to choose them in different situations.
I found a previous post (https://codereview.stackexchange.com/questions/23981/how-to-optimize-this-simple-python-program) saying that:
How can I optimize this code in terms of time and memory used? Note that I'm using different function to read the input, as sys.stdin.readline() is the fastest one when reading strings and input() when reading integers.
Is that statement true ?
stdin. readline() is the fastest one when reading strings and input() when reading integers.
In Python, the readlines() method reads the entire stream, and then splits it up at the newline character and creates a list of each line.
stdin is a file-like object on which you can call functions read or readlines if you want to read everything or you want to read everything and split it by newline automatically. (You need to import sys for this to work.) If you want to prompt the user for input, you can use raw_input in Python 2.
The builtin input
and sys.stdin.readline
functions don't do exactly the same thing, and which one is faster may depend on the details of exactly what you're doing. As aruisdante commented, the difference is less in Python 3 than it was in Python 2, when the quote you provide was from, but there are still some differences.
The first difference is that input
has an optional prompt parameter that will be displayed if the interpreter is running interactively. This leads to some overhead, even if the prompt is empty (the default). On the other hand, it may be faster than doing a print
before each readline
call, if you do want a prompt.
The next difference is that input
strips off any newline from the end of the input. If you're going to strip that anyway, it may be faster to let input
do it for you, rather than doing sys.stdin.readline().strip()
.
A final difference is how the end of the input is indicated. input
will raise an EOFError
when you call it if there is no more input (stdin has been closed on the other end). sys.stdin.readline
on the other hand will return an empty string at EOF, which you need to know to check for.
There's also a third option, using the file iteration protocol on sys.stdin
. This is likely to be much like calling readline
, but perhaps nicer logic to it.
I suspect that while differences in performance between your various options may exist, they're liky to be smaller than the time cost of simply reading the file from the disk (if it is large) and doing whatever you are doing with it. I suggest that you avoid the trap of premature optimization and just do what is most natural for your problem, and if the program is too slow (where "too slow" is very subjective), you do some profiling to see what is taking the most time. Don't put a whole lot of effort into deciding between the different ways of taking input unless it actually matters.
As Linn1024 says, for reading large amounts of data input()
is much slower.
A simple example is this:
import sys
for i in range(int(sys.argv[1])):
sys.stdin.readline()
This takes about 0.25μs
per iteration:
$ time yes | py readline.py 1000000
yes 0.05s user 0.00s system 22% cpu 0.252 total
Changing that to sys.stdin.readline().strip()
takes that to about 0.31μs
.
Changing readline()
to input()
is about 10 times slower:
$ time yes | py input.py 1000000
yes 0.05s user 0.00s system 1% cpu 2.855 total
Notice that it's still pretty fast though, so you only really need to worry when you are reading thousands of entries like above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With