Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting rows and ordering the result in R

The seemingly trivial task of selecting rows in a data frame and then ordering them is eluding me, and driving me crazy at the same time. For example, lets have a trivial data frame:

country = c("US", "US", "CA", "US")
company = c("Apple", "Google", "RIM", "MS")
vals = c(100, 70, 50, 90)
df <- data.frame(country, company, vals)

Lets order it by vals

> df[order(vals),]
  country company vals
3      CA     RIM   50
2      US  Google   70
4      US      MS   90
1      US   Apple  100

Works perfectly. Lets now try to select only US companies, and order there values. We get some bogus result.

> df[country=="US", ][order(vals),]
    country company vals
4       US      MS   90
2       US  Google   70
NA    <NA>    <NA>   NA
1       US   Apple  100

Lets order, and then select. Again, a bogus result

> df[order(vals),][country=="US", ]
  country company vals
3      CA     RIM   50
2      US  Google   70
1      US   Apple  100

How do I get a data frame, which only includes US companies, and is sorted by val?

like image 286
Ash Avatar asked Feb 01 '11 14:02

Ash


2 Answers

Not sure you can do this via a set of subsetting calls to [, as you need to refer to ordered or reduced data frame at the second subsetting call. One way is to order the data and supply this to subset() to choose rows from this ordered data frame:

> with(df, subset(df[order(vals),], subset = country == "US"))
  country company vals
2      US  Google   70
4      US      MS   90
1      US   Apple  100
like image 59
Gavin Simpson Avatar answered Oct 29 '22 14:10

Gavin Simpson


I always found it odd that base R didn't have a convenience for reordering a data frame like it does for subsetting. So I wrote my own:

library(plyr)
arrange(subset(df, country == "US"), vals)
like image 36
hadley Avatar answered Oct 29 '22 13:10

hadley