Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get length of CSV to show progress

Tags:

python

csv

I am working with a large number of CSV files, each of which contain a large amount of rows. My goal is to take the data line by line and write it to a database using Python. However because there is a large amount of data I would like tot keep track of how much data has been written. For this I have counted the amount of files being queued and keep on adding one every time a file is complete.

I would like to do something similar for the CSV files and show what row I am on, and how many rows there are in total (for example: Currently on row 1 of X). I can easily get he current row by starting at one and then doing something like: currentRow += 1, however I am unsure how to get the total with out going though the time consuming process of reading line.

Additionally because my CSV files are all stored in zip archives I am currently reading them using the ZipFile module like this:

#The Zip archive and the csv files share the same name
with zipArchive.open(fileName[:-4] + '.csv', 'r') as csvFile:
    lines = (line.decode('ascii') for line in csvFile)
    currentRow = 1

    for row in csv.reader(lines):
        print(row)
        currentRow += 1

Any ideas on how I can quickly get a total row count of a CSV file?

like image 666
ng150716 Avatar asked Aug 18 '16 18:08

ng150716


People also ask

How do I find the length of a CSV file?

Using len() function Under this method, we need to read the CSV file using pandas library and then use the len() function with the imported CSV file, which will return an int value of a number of lines/rows present in the CSV file.

How do I count the number of rows in a CSV file without opening it?

Through a file connection, count. fields() counts the number of fields per line of the file based on some sep value (that we don't care about here). So if we take the length of that result, theoretically we should end up with the number of lines (and rows) in the file. See help(count.

What is the better way to read the large CSV file?

So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.


1 Answers

If you just want to show some progress, you could try using tqdm.

from tqdm import tqdm

with zipArchive.open(fileName[:-4] + '.csv', 'r') as csvFile:
    lines = [line.decode('ascii') for line in csvFile]
    currentRow = 1

    for row in tqdm(csv.reader(lines), total=len(lines)):
        print(row)
        currentRow += 1

This should give you a sleek progress bar with virtually no effort on your part.

like image 199
Lily Mara Avatar answered Oct 15 '22 09:10

Lily Mara