Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected behaviour in indexing data.frame by row name

Tags:

r

subset

I don't use indexing data.frame by row name very often but there is an advantage to do it sometimes. However I've noticed unexpected result when I've tried to filter unexciting row

  test <- data.frame(a = c("a", "b", "c"), 
                     b = c("A", "B", "C"), 
                     row.names = c(-99.5, 99.5, 99))
  test["-99", ]

You would expect it will give you

     a    b
NA <NA> <NA>

but it returns

      a b
-99.5 a A

Just to be specific

Session info ---------------------------------------------------------------
 setting  value                       
 version  R version 3.2.1 (2015-06-18)
 system   x86_64, mingw32             
 ui       RStudio (0.99.441)          
 language (EN)                        
 collate  English_United Kingdom.1252 
 tz       Europe/London  

Any ideas?

like image 863
kismsu Avatar asked Aug 05 '15 15:08

kismsu


1 Answers

This is indeed unexpected.

The answer to this lies in the partial matching of row names when indexing:

mtcars["Val", ]

Will give us the "Valient" row. This wont work for columns:

mtcars[ ,"cy"]

To eliminate this, I'd subset using:

subset(test, rownames(test) == "-99")

Edit: It is indeed documented in ?"[.data.frame"

Both [ and [[ extraction methods partially match row names. By default neither partially match column names, but [[ will if exact = FALSE (and with a warning if exact = NA). If you want to exact matching on row names use match, as in the examples.

To use match on your data:

test[match("-99", row.names(test)), ]
like image 95
jeremycg Avatar answered Oct 05 '22 22:10

jeremycg