Number of lines in csv.DictReader

Tags:

I have a csv DictReader object (using Python 3.1), but I would like to know the number of lines/rows contained in the reader before I iterate through it. Something like as follows...

myreader = csv.DictReader(open('myFile.csv', newline=''))  totalrows = ?  rowcount = 0 for row in myreader:     rowcount +=1     print("Row %d/%d" % (rowcount,totalrows))

I know I could get the total by iterating through the reader, but then I couldn't run the 'for' loop. I could iterate through a copy of the reader, but I cannot find how to copy an iterator.

I could also use

totalrows = len(open('myFile.csv').readlines())

but that seems an unnecessary re-opening of the file. I would rather get the count from the DictReader if possible.

Any help would be appreciated.

Alan

991

asked May 23 '10 03:05

2 Answers

rows = list(myreader) totalrows = len(rows) for i, row in enumerate(rows):     print("Row %d/%d" % (i+1, totalrows))

102

answered Sep 19 '22 15:09

jfs

You only need to open the file once:

import csv  f = open('myFile.csv', 'rb')  countrdr = csv.DictReader(f) totalrows = 0 for row in countrdr:   totalrows += 1  f.seek(0)  # You may not have to do this, I didn't check to see if DictReader did  myreader = csv.DictReader(f) for row in myreader:   do_work

No matter what you do you have to make two passes (well, if your records are a fixed length - which is unlikely - you could just get the file size and divide, but lets presume that isn't the case). Opening the file again really doesn't cost you much, but you can avoid it as illustrated here. Converting to a list just to use len() is potentially going to waste tons of memory, and not be any faster.

Note: The 'Pythonic' way is to use enumerate instead of +=, but the UNPACK_TUPLE opcode is so expensive that it makes enumerate slower than incrementing a local. That being said, it's likely an unnecessary micro-optimization that you should probably avoid.

More Notes: If you really just want to generate some kind of progress indicator, it doesn't necessarily have to be record based. You can tell() on the file object in the loop and just report what % of the data you're through. It'll be a little uneven, but chances are on any file that's large enough to warrant a progress bar the deviation on record length will be lost in the noise.

answered Sep 19 '22 15:09

Nick Bastin

Related questions
                            
                                Checking call order across multiple mocks
                            
                                Avoiding "Too broad exception clause" warning in PyCharm
                            
                                What is the difference between 'py' and 'python' in the Windows terminal?
                            
                                How to extend distutils with a simple post install script?
                            
                                Django - specify which model manager Django admin should use
                            
                                Is MATLAB faster than Python?
                            
                                Python: AttributeError: '_io.TextIOWrapper' object has no attribute 'split'
                            
                                Python equivalent for HashMap [duplicate]
                            
                                Why does bool(xml.etree.ElementTree.Element) evaluate to False?
                            
                                Check if all values in dataframe column are the same
                            
                                Converting an object into a subclass in Python?
                            
                                matplotlib - increase resolution to see details
                            
                                Why is there no tkinter distribution found?
                            
                                What's the difference between scikit-learn and tensorflow? Is it possible to use them together?
                            
                                How to create a list or tuple of empty lists in Python?
                            
                                How to get status code by using selenium.py (python code)
                            
                                How can I escape latex code received through user input?
                            
                                Accessing value inside nested dictionaries [duplicate]
                            
                                Beautiful Soup: 'ResultSet' object has no attribute 'find_all'?
                            
                                How to refer to the class from within it (like a recursive function)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Number of lines in csv.DictReader

Tags:

python

iterator

python-3.x

Alan Harris-Reid

People also ask

2 Answers

jfs

Nick Bastin

Recent Activity

Donate For Us