Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get top n companies from a data frame in decreasing order

I am trying to get the top 'n' companies from a data frame.Here is my code below.

data("Forbes2000", package = "HSAUR")
sort(Forbes2000$profits,decreasing=TRUE)

Now I would like to get the top 50 observations from this sorted vector.

like image 608
Teja Avatar asked Aug 29 '12 23:08

Teja


4 Answers

head and tail are really useful functions!

head(sort(Forbes2000$profits,decreasing=TRUE), n = 50)

If you want the first 50 rows of the data.frame, then you can use the arrange function from plyr to sort the data.frame and then use head

library(plyr)

head(arrange(Forbes2000,desc(profits)), n = 50)

Notice that I wrapped profits in a call to desc which means it will sort in decreasing order.

To work without plyr

head(Forbes2000[order(Forbes2000$profits, decreasing= T),], n = 50)
like image 52
mnel Avatar answered Oct 17 '22 07:10

mnel


Use order to sort the data.frame, then use head to get only the first 50 rows.

data("Forbes2000", package = "HSAUR")
head(Forbes2000[order(Forbes2000$profits, decreasing=TRUE), ], 50)
like image 37
GSee Avatar answered Oct 17 '22 06:10

GSee


You can use rank from dplyr.

    library(dplyr)
    top_fifty <- Forbes2000 %>%
         filter(rank(desc(profits))<=50)

This sorts your data in descending order and only keeps values where the rank is less than or equal to 50 (i.e. the top 50).
Dplyr is very useful. The commands and chaining syntax are very easy to understand. 10/10 would recommend.

like image 37
Vince Avatar answered Oct 17 '22 08:10

Vince


Mnel is right that in general, You want to use head() and tail() functions along with the a sorting function. I should mention though for medium data sets Vince's method works faster. If you didn't use head() or tail(), then you could used the basic subsection call operator []....

 library(plyr)
 x = arrange(Forbes2000,desc(profits))
 x = x[1:50,]
 #Or using Order
 x = Forbes2000[order(Forbes2000$profits, decreasing= T),]
 x = x[1:50,]

However, I really do recommend the head(), tail(), or filter() functions because the regular [] operator assumes your data is structured in easily drawn array or matrix format. (Hopefully, this answers Teja question)

Now which pacakage you choose is largely subjective. However reading people's comments, I will say that the choice to use plyr's arrange(), {bases}'s order() with {utils} head() and tails, or plyr() largely depends on the memory size and row size of your dataset. I could go into more detail about how Plyr and sometimes Dplyr have problems with large complex datasets, but I don't want to get off topic.

P.S. This is one of my first times answering so feedback is appreciated.

like image 23
mlane Avatar answered Oct 17 '22 06:10

mlane