Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fread to read top n rows from a large file

I am getting below error when reading first n rows from a big file(around 50 GB) using fread. Looks like a memory issue. I tried to use nrows=1000 . But no luck. Using linux

file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.

Can this below code be replaced with read.csv with all options as used below? Does it help?

  rdata<- fread(
      file=csvfile, sep= "|", header=FALSE, col.names= colsinfile,
    select= colstoselect, key = "keycolname", na.strings= c("", "NA")
    , nrows= 500
  )
like image 596
sjd Avatar asked Sep 25 '18 07:09

sjd


2 Answers

Another workaround is to fetch the first 500 lines with shell command:

rdata<- fread(
    cmd = paste('head -n 500', csvfile),
    sep= "|", header=FALSE, col.names= colsinfile,
    select= colstoselect, key = "keycolname", na.strings= c("", "NA")
)

I don't known why nrows doesn't work, though.

like image 150
mt1022 Avatar answered Sep 19 '22 16:09

mt1022


Perhaps this would help you:

processFile = function(filepath) {
con = file(filepath, "r")
while ( TRUE ) {
line = readLines(con, n = 1)
if ( length(line) == 0 ) {
  break
}
print(line)
}
close(con)
}

see reading a text file in R line by line.. In your case you'd probably want to replace the while ( TRUE ) by for(i in 1:1000)

like image 22
gaut Avatar answered Sep 20 '22 16:09

gaut