Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add a "rank" column to a data frame

Tags:

r

I have a dataframe with counts of different items, in different years:

df <- data.frame(item = rep(c('a','b','c'), 3),                  year = rep(c('2010','2011','2012'), each=3),                  count = c(1,4,6,3,8,3,5,7,9)) 

And I would like to add a "year.rank" column, which gives an item's rank within a given year, where a higher count leads to a higher "rank". With the above, it would look like:

  item year count year.rank 1    a 2010     1         3 2    b 2010     4         2 3    c 2010     6         1 4    a 2011     3         2 5    b 2011     8         1 6    c 2011     3         3 7    a 2012     5         3 8    b 2012     7         2 9    c 2012     9         1 

I know I could do this for the whole data frame using order(df$count), but I'm not sure how I would do it by year.

like image 481
Wilduck Avatar asked Mar 02 '13 04:03

Wilduck


People also ask

How do you rank a column in Pandas?

Pandas DataFrame: rank() functionThe rank() function is used to compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values. Index to direct ranking.

How do I rank a column into a DataFrame in R?

The ranking of a variable in an R data frame can be done by using rank function. For example, if we have a data frame df that contains column x then rank of values in x can be found as rank(df$x).

How do you assign a column to a data frame?

To assign new columns to a DataFrame, use the Pandas assign() method. The assign() returns the new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The length of the newly assigned column must match the number of rows in the DataFrame.

How do I add a conditional column to a data frame?

You can create a conditional DataFrame column by checking multiple columns using numpy. select() function. The select() function is more capable than the previous methods. We can use it to give a set of conditions and a set of values.


2 Answers

There is a rank function to help you with that:

transform(df,            year.rank = ave(count, year,                            FUN = function(x) rank(-x, ties.method = "first")))   item year count year.rank 1    a 2010     1         3 2    b 2010     4         2 3    c 2010     6         1 4    a 2011     3         2 5    b 2011     8         1 6    c 2011     3         3 7    a 2012     5         3 8    b 2012     7         2 9    c 2012     9         1 
like image 114
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 21 '22 13:09

A5C1D2H2I1M1N2O1R2T1


data.table version for practice:

library(data.table) DT <- as.data.table(df) DT[,yrrank:=rank(-count,ties.method="first"),by=year]     item year count yrrank 1:    a 2010     1      3 2:    b 2010     4      2 3:    c 2010     6      1 4:    a 2011     3      2 5:    b 2011     8      1 6:    c 2011     3      3 7:    a 2012     5      3 8:    b 2012     7      2 9:    c 2012     9      1 
like image 26
thelatemail Avatar answered Sep 21 '22 13:09

thelatemail