Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read.table and comments in R

I'd like to add metadata to my spreadsheet as comments, and have R ignore these afterwards.

My data are of the form

v1,v2,v3,
1,5,7,
4,2,1,#possible error,

(which the exception that it is much longer. the first comment actually appears well outside of the top 5 rows, used by scan to determine the number of columns)

I've been trying:

read.table("data.name",header=TRUE,sep=",",stringsAsFactors=FALSE,comment.char="#")

But read.table (and, for that matter, count.fields) thinks that I have one more field than I actually do. My data frame ends up with a blank column called 'X'. I think this is because my spreadsheet program adds commas to the end of every line (as in the above example).

Using flush=TRUE has no effect, even though (according to the help file) it " [...] allows putting comments after the last field [...]"

Using colClasses=c(rep(NA,3),NULL) has no effect either.

I could just delete the column afterwards, but since it seems that this is a common practice I'd like to learn how to do it properly.

Thanks,

Andrew

like image 269
AndrewMacDonald Avatar asked Oct 07 '12 18:10

AndrewMacDonald


1 Answers

From the doc (?read.table):

colClasses character. A vector of classes to be assumed for the columns. Recycled as necessary, or if the character vector is named, unspecified values are taken to be NA.

Possible values are NA (the default, when type.convert is used), "NULL" (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or "factor", "Date" or "POSIXct". Otherwise there needs to be an as method (from package methods) for conversion from "character" to the specified formal class.

Note that it says to use "NULL", not NULL. Indeed, this works as expected:

con <- textConnection("
v1,v2,v3,
1,5,7,
4,2,1,#possible error,
")

read.table(con, header = TRUE, sep = ",",
           stringsAsFactors = FALSE, comment.char = "#",
           colClasses = c(rep(NA, 3), "NULL"))
#   v1 v2 v3
# 1  1  5  7
# 2  4  2  1
like image 62
flodel Avatar answered Nov 14 '22 21:11

flodel