Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip all leading empty lines in read.csv

Tags:

import

r

read.csv

I am wishing to import csv files into R, with the first non empty line supplying the name of data frame columns. I know that you can supply the skip = 0 argument to specify which line to read first. However, the row number of the first non empty line can change between files.

How do I work out how many lines are empty, and dynamically skip them for each file?

As pointed out in the comments, I need to clarify what "blank" means. My csv files look like:

,,,
w,x,y,z
a,b,5,c
a,b,5,c
a,b,5,c
a,b,4,c
a,b,4,c
a,b,4,c

which means there are rows of commas at the start.

like image 744
Alex Avatar asked Oct 20 '14 00:10

Alex


2 Answers

read.csv automatically skips blank lines (unless you set blank.lines.skip=FALSE). See ?read.csv

After writing the above, the poster explained that blank lines are not actually blank but have commas in them but nothing between the commas. In that case use fread from the data.table package which will handle that. The skip= argument can be set to any character string found in the header:

library(data.table)
DT <- fread("myfile.csv", skip = "w") # assuming w is in the header
DF <- as.data.frame(DT)

The last line can be omitted if a data.table is ok as the returned value.

like image 70
G. Grothendieck Avatar answered Sep 29 '22 17:09

G. Grothendieck


Depending on your file size, this may be not the best solution but will do the job.

Strategy here is, instead of reading file with delimiter, will read as lines, and count the characters and store into temp. Then, while loop will search for first non-zero character length in the list, then will read the file, and store as data_filename.

flist = list.files()
for (onefile in flist) {
  temp = nchar(readLines(onefile))
  i = 1
  while (temp[i] == 0) {
    i = i + 1
  }
  temp = read.table(onefile, sep = ",", skip = (i-1))
  assign(paste0(data, onefile), temp)
}

If file contains headers, you can start i from 2.

like image 45
won782 Avatar answered Sep 29 '22 15:09

won782