I am working with a large number of CSV files, each of which contain a large amount of rows. My goal is to take the data line by line and write it to a database using Python. However because there is a large amount of data I would like tot keep track of how much data has been written. For this I have counted the amount of files being queued and keep on adding one every time a file is complete.
I would like to do something similar for the CSV files and show what row I am on, and how many rows there are in total (for example: Currently on row 1 of X
). I can easily get he current row by starting at one and then doing something like: currentRow += 1
, however I am unsure how to get the total with out going though the time consuming process of reading line.
Additionally because my CSV files are all stored in zip archives I am currently reading them using the ZipFile module like this:
#The Zip archive and the csv files share the same name
with zipArchive.open(fileName[:-4] + '.csv', 'r') as csvFile:
lines = (line.decode('ascii') for line in csvFile)
currentRow = 1
for row in csv.reader(lines):
print(row)
currentRow += 1
Any ideas on how I can quickly get a total row count of a CSV file?
Using len() function Under this method, we need to read the CSV file using pandas library and then use the len() function with the imported CSV file, which will return an int value of a number of lines/rows present in the CSV file.
Through a file connection, count. fields() counts the number of fields per line of the file based on some sep value (that we don't care about here). So if we take the length of that result, theoretically we should end up with the number of lines (and rows) in the file. See help(count.
So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.
If you just want to show some progress, you could try using tqdm.
from tqdm import tqdm
with zipArchive.open(fileName[:-4] + '.csv', 'r') as csvFile:
lines = [line.decode('ascii') for line in csvFile]
currentRow = 1
for row in tqdm(csv.reader(lines), total=len(lines)):
print(row)
currentRow += 1
This should give you a sleek progress bar with virtually no effort on your part.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With