Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently reading specific lines from large files into R

Tags:

r

I'm surprised by how long it takes R to read in a specific line from a large file (11GB+). For example:

> t0 = Sys.time()
> read.table('data.csv', skip=5000000, nrows=1, sep=',')
      V1       V2 V3 V4 V5   V6    V7
1 19.062 56.71047  1 16  8 2006 56281
> print(Sys.time() - t0)
Time difference of 49.68314 secs

OSX terminal can return a specific line in an instant. Does anyone know a more efficient way in R?

like image 895
geotheory Avatar asked Aug 14 '13 15:08

geotheory


People also ask

How do I read a specific line in a text file in R?

Read Lines from a File in R Programming – readLines() Function. readLines() function in R Language reads text lines from an input file. The readLines() function is perfect for text files since it reads the text line by line and creates character objects for each of the lines.

How do I read a large csv file in R?

Method 3: Using fread() method If the CSV files are extremely large, the best way to import into R is using the fread() method from the data. table package. The output of the data will be in the form of Data table in this case.


1 Answers

Well you can use something like this

 dat <- read.table(pipe("sed -n -e'5000001p' data.csv"), sep=',')

to read just the line extracted with other shell tools.

Also note that system.time(someOps) is an easier way to measure time.

like image 188
Dirk Eddelbuettel Avatar answered Oct 18 '22 16:10

Dirk Eddelbuettel