I'm surprised by how long it takes R to read in a specific line from a large file (11GB+). For example:
> t0 = Sys.time()
> read.table('data.csv', skip=5000000, nrows=1, sep=',')
V1 V2 V3 V4 V5 V6 V7
1 19.062 56.71047 1 16 8 2006 56281
> print(Sys.time() - t0)
Time difference of 49.68314 secs
OSX terminal can return a specific line in an instant. Does anyone know a more efficient way in R?
Read Lines from a File in R Programming – readLines() Function. readLines() function in R Language reads text lines from an input file. The readLines() function is perfect for text files since it reads the text line by line and creates character objects for each of the lines.
Method 3: Using fread() method If the CSV files are extremely large, the best way to import into R is using the fread() method from the data. table package. The output of the data will be in the form of Data table in this case.
Well you can use something like this
dat <- read.table(pipe("sed -n -e'5000001p' data.csv"), sep=',')
to read just the line extracted with other shell tools.
Also note that system.time(someOps)
is an easier way to measure time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With