I am trying to get the top 'n' companies from a data frame.Here is my code below.
data("Forbes2000", package = "HSAUR")
sort(Forbes2000$profits,decreasing=TRUE)
Now I would like to get the top 50 observations from this sorted vector.
head
and tail
are really useful functions!
head(sort(Forbes2000$profits,decreasing=TRUE), n = 50)
If you want the first 50 rows of the data.frame, then you can use the arrange
function from plyr
to sort the data.frame
and then use head
library(plyr)
head(arrange(Forbes2000,desc(profits)), n = 50)
Notice that I wrapped profits
in a call to desc
which means it will sort in decreasing order.
To work without plyr
head(Forbes2000[order(Forbes2000$profits, decreasing= T),], n = 50)
Use order
to sort the data.frame
, then use head
to get only the first 50 rows.
data("Forbes2000", package = "HSAUR")
head(Forbes2000[order(Forbes2000$profits, decreasing=TRUE), ], 50)
You can use rank
from dplyr
.
library(dplyr)
top_fifty <- Forbes2000 %>%
filter(rank(desc(profits))<=50)
This sorts your data in descending order and only keeps values where the rank is less than or equal to 50 (i.e. the top 50).
Dplyr is very useful. The commands and chaining syntax are very easy to understand. 10/10 would recommend.
Mnel is right that in general, You want to use head() and tail() functions along with the a sorting function. I should mention though for medium data sets Vince's method works faster. If you didn't use head() or tail(), then you could used the basic subsection call operator []....
library(plyr)
x = arrange(Forbes2000,desc(profits))
x = x[1:50,]
#Or using Order
x = Forbes2000[order(Forbes2000$profits, decreasing= T),]
x = x[1:50,]
However, I really do recommend the head(), tail(), or filter() functions because the regular [] operator assumes your data is structured in easily drawn array or matrix format. (Hopefully, this answers Teja question)
Now which pacakage you choose is largely subjective. However reading people's comments, I will say that the choice to use plyr's arrange(), {bases}'s order() with {utils} head() and tails, or plyr() largely depends on the memory size and row size of your dataset. I could go into more detail about how Plyr and sometimes Dplyr have problems with large complex datasets, but I don't want to get off topic.
P.S. This is one of my first times answering so feedback is appreciated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With