The seemingly trivial task of selecting rows in a data frame and then ordering them is eluding me, and driving me crazy at the same time. For example, lets have a trivial data frame:
country = c("US", "US", "CA", "US")
company = c("Apple", "Google", "RIM", "MS")
vals = c(100, 70, 50, 90)
df <- data.frame(country, company, vals)
Lets order it by vals
> df[order(vals),]
country company vals
3 CA RIM 50
2 US Google 70
4 US MS 90
1 US Apple 100
Works perfectly. Lets now try to select only US companies, and order there values. We get some bogus result.
> df[country=="US", ][order(vals),]
country company vals
4 US MS 90
2 US Google 70
NA <NA> <NA> NA
1 US Apple 100
Lets order, and then select. Again, a bogus result
> df[order(vals),][country=="US", ]
country company vals
3 CA RIM 50
2 US Google 70
1 US Apple 100
How do I get a data frame, which only includes US companies, and is sorted by val?
Not sure you can do this via a set of subsetting calls to [
, as you need to refer to ordered or reduced data frame at the second subsetting call. One way is to order the data and supply this to subset()
to choose rows from this ordered data frame:
> with(df, subset(df[order(vals),], subset = country == "US"))
country company vals
2 US Google 70
4 US MS 90
1 US Apple 100
I always found it odd that base R didn't have a convenience for reordering a data frame like it does for subsetting. So I wrote my own:
library(plyr)
arrange(subset(df, country == "US"), vals)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With