I have found an issue where R seems to interpret "T"
as TRUE
even while using all means to avoid doing so (at least according to this post).
Example data (saved as "test.txt"):
col1 col2
1 T
2 T
3 T
4 T
5 T
6 T
7 T
8 T
9 T
Example code:
read.table("test.txt", as.is=TRUE, header=TRUE,
stringsAsFactors=FALSE, colClasses=c(character()))
Produces:
col1 col2
1 1 TRUE
2 2 TRUE
3 3 TRUE
4 4 TRUE
5 5 TRUE
6 6 TRUE
7 7 TRUE
8 8 TRUE
9 9 TRUE
Only non-ideal solution I found was to set header=FALSE:
read.table("test.txt", as.is=TRUE, header=FALSE,
stringsAsFactors=FALSE,
colClasses=c(character()))
V1 V2
1 col1 col2
2 1 T
3 2 T
4 3 T
5 4 T
6 5 T
7 6 T
8 7 T
9 8 T
10 9 T
I realize this may seem somewhat contrived, but this edge case is genuine in that a human gene is named actually "T"
(!) with values in col1
being positions within that gene.
Thanks in advance for the help
What makes you think this is "unexpectedly" ?
R guesses for you (and that is generally helpful), but if you know better, use the colClasses=...
argument to tell R.
R> res <- read.table(textConnection("col1 col2\n1 T\n2 T\n3 T"),
+ header=TRUE, colClasses=c("numeric", "character"))
R> res
col1 col2
1 1 T
2 2 T
3 3 T
R> sapply(res, class)
col1 col2
"numeric" "character"
R>
Your post was a little oddly formatted so I didn't see at first that you did in fact specify colClasses
. Despite the recycling rule I always recommend to supply a vector
with as many entries as you have columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With