Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why won't dplyr's top_n() work?

Tags:

r

dplyr

I have a dataframe called df:

City,State,Price,Dogs
Portland,OR,75,1
Portland,OR,100,3
San Diego,CA,12,4
San Diego,CA,23,5
...

I used dplyr's summarise and group_by functions...

df.median <- summarise(
  group_by(
    df, 
    State, 
    City
  ),
  MEDIAN_PRICE = median(Price),
  SUM_DOGS = sum(Dogs)
)

But when I run top_n(df.median, 100, SUM_DOGS), R does not give me cities with the 100 highest values in SUM_DOGS. It just returns df.median.

Why?

like image 209
Username Avatar asked Mar 31 '16 18:03

Username


1 Answers

You likely need to ungroup, so you pick the top_n from the whole dataset rather than the top_n from each State (as your dataset is currently grouped).

top_n(ungroup(df.median), 100, SUM_DOGS)
like image 68
aosmith Avatar answered Sep 29 '22 09:09

aosmith