Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of lines in csv.DictReader

I have a csv DictReader object (using Python 3.1), but I would like to know the number of lines/rows contained in the reader before I iterate through it. Something like as follows...

myreader = csv.DictReader(open('myFile.csv', newline=''))  totalrows = ?  rowcount = 0 for row in myreader:     rowcount +=1     print("Row %d/%d" % (rowcount,totalrows)) 

I know I could get the total by iterating through the reader, but then I couldn't run the 'for' loop. I could iterate through a copy of the reader, but I cannot find how to copy an iterator.

I could also use

totalrows = len(open('myFile.csv').readlines()) 

but that seems an unnecessary re-opening of the file. I would rather get the count from the DictReader if possible.

Any help would be appreciated.

Alan

like image 991
Alan Harris-Reid Avatar asked May 23 '10 03:05

Alan Harris-Reid


People also ask

How do I count the number of lines in a CSV file?

Using len() function Under this method, we need to read the CSV file using pandas library and then use the len() function with the imported CSV file, which will return an int value of a number of lines/rows present in the CSV file.

How many lines can a CSV file have?

Cell Character Limits csv files have a limit of 32,767 characters per cell. Excel has a limit of 1,048,576 rows and 16,384 columns per sheet. CSV files can hold many more rows.

How do I count CSV rows in Python?

Because: It saves lot of memory without having to create list. def read_raw_csv(file_name): with open(file_name, 'r') as file: csvreader = csv. reader(file) # count number of rows entry_count = sum(1 for row in csvreader) print(entry_count-1) # -1 is for discarding header row. Show activity on this post.

What is the difference between CSV reader and CSV DictReader?

csv. Reader() allows you to access CSV data using indexes and is ideal for simple CSV files. csv. DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files.


2 Answers

rows = list(myreader) totalrows = len(rows) for i, row in enumerate(rows):     print("Row %d/%d" % (i+1, totalrows)) 
like image 102
jfs Avatar answered Sep 19 '22 15:09

jfs


You only need to open the file once:

import csv  f = open('myFile.csv', 'rb')  countrdr = csv.DictReader(f) totalrows = 0 for row in countrdr:   totalrows += 1  f.seek(0)  # You may not have to do this, I didn't check to see if DictReader did  myreader = csv.DictReader(f) for row in myreader:   do_work 

No matter what you do you have to make two passes (well, if your records are a fixed length - which is unlikely - you could just get the file size and divide, but lets presume that isn't the case). Opening the file again really doesn't cost you much, but you can avoid it as illustrated here. Converting to a list just to use len() is potentially going to waste tons of memory, and not be any faster.

Note: The 'Pythonic' way is to use enumerate instead of +=, but the UNPACK_TUPLE opcode is so expensive that it makes enumerate slower than incrementing a local. That being said, it's likely an unnecessary micro-optimization that you should probably avoid.

More Notes: If you really just want to generate some kind of progress indicator, it doesn't necessarily have to be record based. You can tell() on the file object in the loop and just report what % of the data you're through. It'll be a little uneven, but chances are on any file that's large enough to warrant a progress bar the deviation on record length will be lost in the noise.

like image 23
Nick Bastin Avatar answered Sep 19 '22 15:09

Nick Bastin