I'd like to add metadata to my spreadsheet as comments, and have R ignore these afterwards.
My data are of the form
v1,v2,v3,
1,5,7,
4,2,1,#possible error,
(which the exception that it is much longer. the first comment actually appears well outside of the top 5 rows, used by scan
to determine the number of columns)
I've been trying:
read.table("data.name",header=TRUE,sep=",",stringsAsFactors=FALSE,comment.char="#")
But read.table
(and, for that matter, count.fields
) thinks that I have one more field than I actually do. My data frame ends up with a blank column called 'X'. I think this is because my spreadsheet program adds commas to the end of every line (as in the above example).
Using flush=TRUE
has no effect, even though (according to the help file) it " [...] allows putting comments after the last field [...]"
Using colClasses=c(rep(NA,3),NULL)
has no effect either.
I could just delete the column afterwards, but since it seems that this is a common practice I'd like to learn how to do it properly.
Thanks,
Andrew
From the doc (?read.table
):
colClasses character. A vector of classes to be assumed for the columns. Recycled as necessary, or if the character vector is named, unspecified values are taken to be NA.
Possible values are NA (the default, when type.convert is used), "NULL" (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or "factor", "Date" or "POSIXct". Otherwise there needs to be an as method (from package methods) for conversion from "character" to the specified formal class.
Note that it says to use "NULL"
, not NULL
. Indeed, this works as expected:
con <- textConnection("
v1,v2,v3,
1,5,7,
4,2,1,#possible error,
")
read.table(con, header = TRUE, sep = ",",
stringsAsFactors = FALSE, comment.char = "#",
colClasses = c(rep(NA, 3), "NULL"))
# v1 v2 v3
# 1 1 5 7
# 2 4 2 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With