Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read Large File line by line in R without header

Tags:

r

csv

I have a very large data file in R (in Giga), If I try to open it with R , I will get an out of memory error.

I need to read the file line by line and do some analysis. I found a previous question on this issue where the file was read by n-lines and jump to certain lines with clump. I have used the answer by "Nick Sabbe" and added some modifications to fit my need.

Consider that I have the following test.csv file-sample of the file:

A    B    C
200 19  0.1
400 18  0.1
300 29  0.1
800 88  0.1
600 80  0.1
150 50  0.1
190 33  0.1
270 42  0.1
900 73  0.1
730 95  0.1

I want to read the content of the file line by line and perform my analysis. So I have create the following loop to read based on the code posted by"Nick Sabbe". I have two problems: 1) The header is printed for each time I'm printing new line. 2) The index "X" column by R is also printed although I'm deleting this column.

Here is the code I'm using:

test<-function(){
 prev<-0

for(i in 1:100){
  j<-i-prev
  test1<-read.clump("file.csv",j,i)
  print(test1)
  prev<-i

}
}
####################
# Code by Nick Sabbe
###################
read.clump <- function(file, lines, clump, readFunc=read.csv,
                   skip=(lines*(clump-1))+ifelse((header) & (clump>1) & (!inherits(file, "connection")),1,0),
                   nrows=lines,header=TRUE,...){
if(clump > 1){
colnms<-NULL
if(header)
{
  colnms<-unlist(readFunc(file, nrows=1, header=F))
  #print(colnms)
}
p = readFunc(file, skip = skip,
             nrows = nrows, header=FALSE,...)
if(! is.null(colnms))
{
  colnames(p) = colnms
}
} else {
 p = readFunc(file, skip = skip, nrows = nrows, header=header)
}
p$X<-NULL   # Note: Here I'm setting the index to NULL
return(p)
}

The output I'm getting:

       A       B    C
1      200      19   0.1
  NA   1       1     1
1  2   400     18   0.1
  NA   1       1    1
1  3   300     29   0.1
  NA   1       1    1
1  4   800     88   0.1
  NA   1       1    1
1  5   600     80   0.1

I want to get rid of for the rest of reading:

 NA   1       1     1

Also, is there any way to make the for loop stop when end of file such EOF in other language???

like image 522
SimpleNEasy Avatar asked Dec 04 '12 20:12

SimpleNEasy


People also ask

How do I read a line from a text file in R?

Read Lines from a File in R Programming – readLines() Function. readLines() function in R Language reads text lines from an input file. The readLines() function is perfect for text files since it reads the text line by line and creates character objects for each of the lines.

How do I read the contents of a file line by line?

The line must be terminated by any one of a line feed ("\n") or carriage return ("\r"). In the following example, Demo. txt is read by FileReader class. The readLine() method of BufferedReader class reads file line by line, and each line appended to StringBuffer, followed by a linefeed.

What is read Delim in R?

The read. delim function is typically used to read in delimited text files, where data is organized in a data matrix with rows representing cases and columns representing variables. We can also write a matrix or data frame to a text file using the write.


1 Answers

Maybe something like this can help you :

inputFile <- "foo.txt"
con  <- file(inputFile, open = "r")
while (length(oneLine <- readLines(con, n = 1)) > 0) {
  myLine <- unlist((strsplit(oneLine, ",")))
  print(myLine)
} 
close(con)

or with scan to avoid splitting as @MatthewPlourde

I use scan : I skip the header, and quiet = TRUE to not have message saying how many items have been

while (length(myLine <- scan(con,what="numeric",nlines=1,sep=',',skip=1,quiet=TRUE)) > 0 ){
   ## here I print , but you must have a process your line here
   print(as.numeric(myLine))

} 
like image 90
agstudy Avatar answered Sep 22 '22 04:09

agstudy