Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change the number of rows read.table uses to determine the number of column in R

Tags:

r

read.table

In ?read.table is stated that:

The number of data columns is determined by looking at the first five lines of input
(or the whole file if it has less than five lines), or from the length of col.names
if it is specified and is longer. This could conceivably be wrong if fill or
blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’).

I need to use the fill paramenter and some of my txt files may have the row with the highest number of column after the 5th row. I can't use an header, just because I don't have it and the col.names will be defined after the import, so I would like to change these 5 rows that R used into the whole file, (I don't mind any speed loss I could get). Any suggestion? Thanks!

EDIT:

just found this in the code of read.table

if (skip > 0L) 
    readLines(file, skip)
nlines <- n0lines <- if (nrows < 0L) 
    5
else min(5L, (header + nrows))
lines <- .External(C_readtablehead, file, nlines, comment.char, 
    blank.lines.skip, quote, sep)
nlines <- length(lines)

can I just change the number 5 in the 4th rows of the above code? is that going to have any side effect on the read.table behaviours?

EDIT 2:

I'm currently using this method

maxCol <- max(sapply(readLines(filesPath), function(x) length(strsplit(x, ",")[[1]])))

to have the max number of columns, and putting the result to create dummy col.names like paste0("V", seq_len(maxCol)). Do you think is still worth to have another read.table with the possibility to chose that?

like image 724
Michele Avatar asked May 16 '13 10:05

Michele


1 Answers

Use count.fields, e.g.,

read.table(filesPath, colClasses=rep(NA, max(count.fields(filesPath))), fill=TRUE)
like image 140
Matthew Plourde Avatar answered Sep 24 '22 07:09

Matthew Plourde