I would like to read a CSV file from the standard input and process each row as it comes. My CSV outputting code writes rows one by one, but my reader waits the stream to be terminated before iterating the rows. Is this a limitation of csv
module? Am I doing something wrong?
My reader code:
import csv import sys import time reader = csv.reader(sys.stdin) for row in reader: print "Read: (%s) %r" % (time.time(), row)
My writer code:
import csv import sys import time writer = csv.writer(sys.stdout) for i in range(8): writer.writerow(["R%d" % i, "$" * (i+1)]) sys.stdout.flush() time.sleep(0.5)
Output of python test_writer.py | python test_reader.py
:
Read: (1309597426.3) ['R0', '$'] Read: (1309597426.3) ['R1', '$$'] Read: (1309597426.3) ['R2', '$$$'] Read: (1309597426.3) ['R3', '$$$$'] Read: (1309597426.3) ['R4', '$$$$$'] Read: (1309597426.3) ['R5', '$$$$$$'] Read: (1309597426.3) ['R6', '$$$$$$$'] Read: (1309597426.3) ['R7', '$$$$$$$$']
As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.
Step 1: Load the CSV file using the open method in a file object. Step 2: Create a reader object with the help of DictReader method using fileobject. This reader object is also known as an iterator can be used to fetch row-wise data. Step 3: Use for loop on reader object to get each row.
We can read a CSV file line by line using the readLine() method of BufferedReader class. Split each line on comma character to get the words of the line into an array. Now we can easily print the contents of the array by iterating over it or by using an appropriate index.
You can do open("data. csv", "rw") , this allows you to read and write at the same time.
As it says in the documentation,
In order to make a
for
loop the most efficient way of looping over the lines of a file (a very common operation), thenext()
method uses a hidden read-ahead buffer.
And you can see by looking at the implementation of the csv
module (line 784) that csv.reader
calls the next()
method of the underlyling iterator (via PyIter_Next
).
So if you really want unbuffered reading of CSV files, you need to convert the file object (here sys.stdin
) into an iterator whose next()
method actually calls readline()
instead. This can easily be done using the two-argument form of the iter
function. So change the code in test_reader.py
to something like this:
for row in csv.reader(iter(sys.stdin.readline, '')): print("Read: ({}) {!r}".format(time.time(), row))
For example,
$ python test_writer.py | python test_reader.py Read: (1388776652.964925) ['R0', '$'] Read: (1388776653.466134) ['R1', '$$'] Read: (1388776653.967327) ['R2', '$$$'] Read: (1388776654.468532) ['R3', '$$$$'] [etc]
Can you explain why you need unbuffered reading of CSV files? There might be a better solution to whatever it is you are trying to do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With