Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ranking dataframe columns in R

I have the data frame, below is sample data from it.

Company     Category    Margin
SBI             BK      34.5
PNB             BK      39.5
UCO BANK        BK      39.9
BANK            BK      41.3
INDIAN BANK     BK      42.3
DENA BANK       BK      44.5
VIJAYA BANK     BK      44.5
UNION BANK      BK      47.6
CENTRAL BANK    BK      49.8
INFOSYS         IT      5.6
HCL TECH        IT      5.9
TCS             IT      6.9
CMC             IT      12.6
TECHMAHINDRA    IT      12.6
COGNIZANT       IT      15.8
IGATE           IT      22.4
WIPRO           IT      22.9
HEXAWARE        IT      34.8
MAHINDRA SATYAM IT      34.8
DR. REDDYS      PH      14.5
SUN PHARMA      PH      19.2
CIPLA           PH      23.9
LUPIN           PH      23.9
DIVIS LABS      PH      29

A careful look at the data frame tells that it is sorted on CATEGORY, MARGIN and then COMPANY columns.

Now, my requirement is to add a new column called Ranking and to give a ranking starting from 1 for every set of CATEGORY. The Ranking numbering should start from 1 whenever a new CATEGORY appears on the list

Sample Output:

Company     Category    Margin     Ranking
SBI             BK      34.5       1
PNB             BK      39.5       2
UCO BANK        BK      39.9       3 
BANK            BK      41.3       4
INDIAN BANK     BK      42.3       5
DENA BANK       BK      44.5       6
VIJAYA BANK     BK      44.5       7
UNION BANK      BK      47.6       8
CENTRAL BANK    BK      49.8       9
INFOSYS         IT      5.6        1
HCL TECH        IT      5.9        2
TCS             IT      6.9        3
CMC             IT      12.6       4
TECHMAHINDRA    IT      12.6       5
COGNIZANT       IT      15.8       6
IGATE           IT      22.4       7
WIPRO           IT      22.9       8
HEXAWARE        IT      34.8       9
MAHINDRA SATYAM IT      34.8       10
DR. REDDYS      PH      14.5       1
SUN PHARMA      PH      19.2       2
CIPLA           PH      23.9       3
LUPIN           PH      23.9       4
DIVIS LABS      PH      29         5

Further Requirement

Assume Input dataset which is completely zigzagged. Then

unique(df$Category)   # gives 5 different category
[1] "BK" "IT" "PH" "MT" "EG"

After formatting, the same one returns

unique(df$Category)   # gives only 3 categories. rest of 2 categories were deleted.
[1] "BK" "IT" "PH"

Note: In the process of formatting the input dataset in order to prepare it free from missing values, a few categories were completed removed.

Note: Returned dataframe should have the row names as categories

After Ranking the data frame, I would like to write a function, wherein I will pass Ranking as a parameter to the function. The function should return a data frame with Company in each CATEGORY with that specific ranking. In case, in any CATEGORY, if there is no COMPANY with such specific RANKING then NA will be returned.

head(companyRanks(3), 4) returns
    COMPANY     CATEGORY
BK  UCO BANK        BK      
IT  TCS             IT      
PH  CIPLA           PH      
MT  <NA>            MT
EG  <NA>            EG

head(companyRanks(10), 4)  # returns:
            COMPANY     CATEGORY
BK             <NA>           BK  # Since there is no company with rank 10 under category BK, NA returned
IT  MAHINDRA SATYAM           IT      
PH             <NA>           PH      
MT             <NA>           MT
EG             <NA>           EG

Is there any function to get this kind of requirement easily?

like image 600
Kumar Avatar asked Dec 25 '22 19:12

Kumar


1 Answers

Suppose your dataframe is named df, try this:

df$Ranking <- ave( df$Margin, df$Category, FUN=rank )
like image 172
Sophia Avatar answered Dec 28 '22 08:12

Sophia