Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read.table unexpectedly interprets "T" as TRUE

Tags:

r

I have found an issue where R seems to interpret "T" as TRUE even while using all means to avoid doing so (at least according to this post).

Example data (saved as "test.txt"):

col1    col2
1   T
2   T
3   T
4   T
5   T
6   T
7   T
8   T
9   T

Example code:

read.table("test.txt", as.is=TRUE, header=TRUE, 
   stringsAsFactors=FALSE, colClasses=c(character())) 

Produces:

  col1 col2
1    1 TRUE
2    2 TRUE
3    3 TRUE
4    4 TRUE
5    5 TRUE
6    6 TRUE
7    7 TRUE
8    8 TRUE
9    9 TRUE

Only non-ideal solution I found was to set header=FALSE:

read.table("test.txt", as.is=TRUE, header=FALSE, 
    stringsAsFactors=FALSE,
    colClasses=c(character()))        


     V1   V2
1  col1 col2
2     1    T
3     2    T
4     3    T
5     4    T
6     5    T
7     6    T
8     7    T
9     8    T
10    9    T

I realize this may seem somewhat contrived, but this edge case is genuine in that a human gene is named actually "T" (!) with values in col1 being positions within that gene.

Thanks in advance for the help

like image 202
Vince Avatar asked Oct 28 '13 20:10

Vince


1 Answers

What makes you think this is "unexpectedly" ?

R guesses for you (and that is generally helpful), but if you know better, use the colClasses=... argument to tell R.

R> res <- read.table(textConnection("col1 col2\n1 T\n2 T\n3 T"), 
+                    header=TRUE, colClasses=c("numeric", "character"))
R> res
    col1 col2
 1    1    T 
 2    2    T 
 3    3    T 
R> sapply(res, class)
        col1        col2  
   "numeric" "character"  
R>

Your post was a little oddly formatted so I didn't see at first that you did in fact specify colClasses. Despite the recycling rule I always recommend to supply a vector with as many entries as you have columns.

like image 200
Dirk Eddelbuettel Avatar answered Nov 15 '22 03:11

Dirk Eddelbuettel