Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to get the number of rows in a CSV file without opening it?

Tags:

r

csv

excel

I have a CSV file of size ~1 GB, and as my laptop is of basic configuration, I'm not able to open the file in Excel or R. But out of curiosity, I would like to get the number of rows in the file. How am I to do it, if at all I can do it?

like image 981
Ohhm Prakash Avatar asked Oct 02 '15 18:10

Ohhm Prakash


People also ask

How do I find out how many rows a CSV file has?

Use len() and list() on a CSV reader to count lines in a CSV file.

How do you check number of rows in Excel without opening?

If you need a quick way to count rows that contain data, select all the cells in the first column of that data (it may not be column A). Just click the column header. The status bar, in the lower-right corner of your Excel window, will tell you the row count.

How do I find the number of rows and columns in a CSV file?

Count the number of rows and columns of Dataframe using len() function. The len() function returns the length rows of the Dataframe, we can filter a number of columns using the df. columns to get the count of columns.

How do I count the number of rows in a CSV file in Unix?

To count the number of records (or rows) in several CSV files the wc can used in conjunction with pipes. In the following example there are five CSV files. The requirement is to find out the sum of records in all five files. This can be achieved by piping the output of the cat command to wc.


2 Answers

For Linux/Unix:

wc -l filename 

For Windows:

find /c /v "A String that is extremely unlikely to occur" filename 
like image 146
Tony Ruth Avatar answered Sep 17 '22 08:09

Tony Ruth


Option 1:

Through a file connection, count.fields() counts the number of fields per line of the file based on some sep value (that we don't care about here). So if we take the length of that result, theoretically we should end up with the number of lines (and rows) in the file.

length(count.fields(filename)) 

If you have a header row, you can skip it with skip = 1

length(count.fields(filename, skip = 1)) 

There are other arguments that you can adjust for your specific needs, like skipping blank lines.

args(count.fields) # function (file, sep = "", quote = "\"'", skip = 0, blank.lines.skip = TRUE,  #     comment.char = "#")  # NULL 

See help(count.fields) for more.

It's not too bad as far as speed goes. I tested it on one of my baseball files that contains 99846 rows.

nrow(data.table::fread("Batting.csv")) # [1] 99846  system.time({ l <- length(count.fields("Batting.csv", skip = 1)) }) #   user  system elapsed  #  0.528   0.000   0.503   l # [1] 99846 file.info("Batting.csv")$size # [1] 6153740 

(The more efficient) Option 2: Another idea is to use data.table::fread() to read the first column only, then take the number of rows. This would be very fast.

system.time(nrow(fread("Batting.csv", select = 1L))) #   user  system elapsed  #  0.063   0.000   0.063  
like image 30
Rich Scriven Avatar answered Sep 19 '22 08:09

Rich Scriven