Efficiently reading specific lines from large files into R

Tags:

r

I'm surprised by how long it takes R to read in a specific line from a large file (11GB+). For example:

> t0 = Sys.time()
> read.table('data.csv', skip=5000000, nrows=1, sep=',')
      V1       V2 V3 V4 V5   V6    V7
1 19.062 56.71047  1 16  8 2006 56281
> print(Sys.time() - t0)
Time difference of 49.68314 secs

OSX terminal can return a specific line in an instant. Does anyone know a more efficient way in R?

895

asked Aug 14 '13 15:08

geotheory

1 Answers

Well you can use something like this

 dat <- read.table(pipe("sed -n -e'5000001p' data.csv"), sep=',')

to read just the line extracted with other shell tools.

Also note that system.time(someOps) is an easier way to measure time.

188

answered Oct 18 '22 16:10

Dirk Eddelbuettel

Related questions
                            
                                How to find out which package was installed from GitHub in my R library?
                            
                                How to avoid that anytime(<numeric>) "updates by reference"?
                            
                                Safer purrr::map2 for lists with names out of order
                            
                                How to get ride of polygon borders using geom_sf in ggplot2
                            
                                How do I use tidyselect "where" in a custom package?
                            
                                What is the difference between . and .data?
                            
                                jitter if multiple outliers in ggplot2 boxplot
                            
                                mapping over the rows of a data frame
                            
                                Sort a list of nontrivial elements in R
                            
                                How can I read a date series of quarterly data into R?
                            
                                Two Color Scales for geom_line in ggplot2
                            
                                Removing Two Characters From A String
                            
                                How to subset data.frame by weeks and then sum?
                            
                                Find out if column in R table includes duplicate values?
                            
                                Number values include comma -- how do I make these numeric? [duplicate]
                            
                                Problems executing script from command line in R. Error message: cannot find path specified
                            
                                How to multiply a single column in a data.frame by a number
                            
                                Include text control characters in plotmath expressions
                            
                                Aggregate a data frame based on unordered pairs of columns
                            
                                Setting constraints in constrOptim

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With