Is it possible to get the number of rows in a CSV file without opening it?

Tags:

I have a CSV file of size ~1 GB, and as my laptop is of basic configuration, I'm not able to open the file in Excel or R. But out of curiosity, I would like to get the number of rows in the file. How am I to do it, if at all I can do it?

981

asked Oct 02 '15 18:10

Ohhm Prakash

2 Answers

For Linux/Unix:

wc -l filename

For Windows:

find /c /v "A String that is extremely unlikely to occur" filename

146

answered Sep 17 '22 08:09

Tony Ruth

Option 1:

Through a file connection, count.fields() counts the number of fields per line of the file based on some sep value (that we don't care about here). So if we take the length of that result, theoretically we should end up with the number of lines (and rows) in the file.

length(count.fields(filename))

If you have a header row, you can skip it with skip = 1

length(count.fields(filename, skip = 1))

There are other arguments that you can adjust for your specific needs, like skipping blank lines.

args(count.fields) # function (file, sep = "", quote = "\"'", skip = 0, blank.lines.skip = TRUE,  #     comment.char = "#")  # NULL

See help(count.fields) for more.

It's not too bad as far as speed goes. I tested it on one of my baseball files that contains 99846 rows.

nrow(data.table::fread("Batting.csv")) # [1] 99846  system.time({ l <- length(count.fields("Batting.csv", skip = 1)) }) #   user  system elapsed  #  0.528   0.000   0.503   l # [1] 99846 file.info("Batting.csv")$size # [1] 6153740

(The more efficient) Option 2: Another idea is to use data.table::fread() to read the first column only, then take the number of rows. This would be very fast.

system.time(nrow(fread("Batting.csv", select = 1L))) #   user  system elapsed  #  0.063   0.000   0.063

answered Sep 19 '22 08:09

Rich Scriven

Related questions
                            
                                How to test when condition returns numeric(0) in R
                            
                                How to read in numbers with a comma as decimal separator?
                            
                                How to preserve base data frame rownames upon filtering in dplyr chain
                            
                                Is Rgraphviz no longer available for R? [duplicate]
                            
                                Exclude columns by names in mutate_at in dplyr
                            
                                Connecting across missing values with geom_line
                            
                                Showing different axis labels using ggplot2 with facet_wrap
                            
                                How expensive is it to compute the eigenvalues of a matrix?
                            
                                How do I put more space between the axis labels and axis title in an R boxplot
                            
                                R equivalent of SELECT DISTINCT on two or more fields/variables
                            
                                geom_bar bars not displaying when specifying ylim
                            
                                Vectorizing a matrix [duplicate]
                            
                                How to subset from a list in R
                            
                                Formatting mouse over labels in plotly when using ggplotly
                            
                                Count the number of non-zero elements of each column
                            
                                dplyr - groupby on multiple columns using variable names
                            
                                Error in printing data.frame in excel using XLSX package in R
                            
                                long/bigint/decimal equivalent datatype in R
                            
                                Reshaping wide to long with multiple values columns [duplicate]
                            
                                Combine (rbind) data frames and create column with name of original data frames

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With