In ?read.table
is stated that:
The number of data columns is determined by looking at the first five lines of input
(or the whole file if it has less than five lines), or from the length of col.names
if it is specified and is longer. This could conceivably be wrong if fill or
blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’).
I need to use the fill
paramenter and some of my txt files may have the row with the highest number of column after the 5th row. I can't use an header, just because I don't have it and the col.names will be defined after the import, so I would like to change these 5 rows that R used into the whole file, (I don't mind any speed loss I could get). Any suggestion? Thanks!
EDIT:
just found this in the code of read.table
if (skip > 0L)
readLines(file, skip)
nlines <- n0lines <- if (nrows < 0L)
5
else min(5L, (header + nrows))
lines <- .External(C_readtablehead, file, nlines, comment.char,
blank.lines.skip, quote, sep)
nlines <- length(lines)
can I just change the number 5
in the 4th rows of the above code? is that going to have any side effect on the read.table
behaviours?
EDIT 2:
I'm currently using this method
maxCol <- max(sapply(readLines(filesPath), function(x) length(strsplit(x, ",")[[1]])))
to have the max number of columns, and putting the result to create dummy col.names
like paste0("V", seq_len(maxCol))
. Do you think is still worth to have another read.table
with the possibility to chose that?
Use count.fields
, e.g.,
read.table(filesPath, colClasses=rep(NA, max(count.fields(filesPath))), fill=TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With