How to obtain the total numbers of rows from a CSV file in Python?

2018-10-29 EDIT

Thank you for the comments.

I tested several kinds of code to get the number of lines in a csv file in terms of speed. The best method is below.

with open(filename) as f:
    sum(1 for line in f)

Here is the code tested.

import timeit
import csv
import pandas as pd

filename = './sample_submission.csv'

def talktime(filename, funcname, func):
    print(f"# {funcname}")
    t = timeit.timeit(f'{funcname}("{filename}")', setup=f'from __main__ import {funcname}', number = 100) / 100
    print('Elapsed time : ', t)
    print('n = ', func(filename))
    print('\n')

def sum1forline(filename):
    with open(filename) as f:
        return sum(1 for line in f)
talktime(filename, 'sum1forline', sum1forline)

def lenopenreadlines(filename):
    with open(filename) as f:
        return len(f.readlines())
talktime(filename, 'lenopenreadlines', lenopenreadlines)

def lenpd(filename):
    return len(pd.read_csv(filename)) + 1
talktime(filename, 'lenpd', lenpd)

def csvreaderfor(filename):
    cnt = 0
    with open(filename) as f:
        cr = csv.reader(f)
        for row in cr:
            cnt += 1
    return cnt
talktime(filename, 'csvreaderfor', csvreaderfor)

def openenum(filename):
    cnt = 0
    with open(filename) as f:
        for i, line in enumerate(f,1):
            cnt += 1
    return cnt
talktime(filename, 'openenum', openenum)

The result was below.

# sum1forline
Elapsed time :  0.6327946722068599
n =  2528244


# lenopenreadlines
Elapsed time :  0.655304473598555
n =  2528244


# lenpd
Elapsed time :  0.7561274056295324
n =  2528244


# csvreaderfor
Elapsed time :  1.5571560935772661
n =  2528244


# openenum
Elapsed time :  0.773000013928679
n =  2528244

In conclusion, sum(1 for line in f) is fastest. But there might not be significant difference from len(f.readlines()).

sample_submission.csv is 30.2MB and has 31 million characters.

To do it you need to have a bit of code like my example here:

file = open("Task1.csv")
numline = len(file.readlines())
print (numline)

I hope this helps everyone.

Several of the above suggestions count the number of LINES in the csv file. But some CSV files will contain quoted strings which themselves contain newline characters. MS CSV files usually delimit records with \r\n, but use \n alone within quoted strings.

For a file like this, counting lines of text (as delimited by newline) in the file will give too large a result. So for an accurate count you need to use csv.reader to read the records.

First you have to open the file with open

input_file = open("nameOfFile.csv","r+")

Then use the csv.reader for open the csv

reader_file = csv.reader(input_file)

At the last, you can take the number of row with the instruction 'len'

value = len(list(reader_file))

The total code is this:

input_file = open("nameOfFile.csv","r+")
reader_file = csv.reader(input_file)
value = len(list(reader_file))

Remember that if you want to reuse the csv file, you have to make a input_file.fseek(0), because when you use a list for the reader_file, it reads all file, and the pointer in the file change its position

row_count = sum(1 for line in open(filename)) worked for me.

Note : sum(1 for line in csv.reader(filename)) seems to calculate the length of first line

Related questions
                            
                                How to develop Android app completely using python? [closed]
                            
                                How to re-raise an exception in nested try/except blocks?
                            
                                Running python script inside ipython
                            
                                How to create a tuple with only one element
                            
                                Unpickling a python 2 object with python 3
                            
                                Invalid http_host header
                            
                                How to get autocomplete in jupyter notebook without using tab?
                            
                                Multiple linear regression in Python
                            
                                How do I write data into CSV format as string (not file)?
                            
                                Why does += behave unexpectedly on lists?
                            
                                How do I get Pyflakes to ignore a statement?
                            
                                python list in sql query as parameter [duplicate]
                            
                                pandas: best way to select all columns whose names start with X
                            
                                What is an 'endpoint' in Flask?
                            
                                How to get a reference to current module's attributes in Python
                            
                                Conda command is not recognized on Windows 10
                            
                                socket.shutdown vs socket.close
                            
                                How to get the parent dir location
                            
                                Python: avoid new line with print command [duplicate]
                            
                                Find a file in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to obtain the total numbers of rows from a CSV file in Python?

Tags:

python

csv

count

People also ask

2018-10-29 EDIT

Recent Activity

Donate For Us