Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mutate by group in R

I have a data with following columns:

    Date         CID      FID        rank  
    31/01/17     abc0001  rx180x01    0
    31/01/17     abc0001  rx180x02    0
    31/01/17     abc0001  rx180x03    2
    28/02/17     abc0001  rx180x32    1
    28/02/17     abc0001  rx180x31    0

Each CID has some unique FIDs mapped to it and has some rank for these FIDs. I need to create 2 new columns finalrank and finalFID.

Final rank is the maximum rank of each CID for each date. ie For CID abc0001 ,in the date 31/01/17, the finalrank would be 2. This logic would be applied across all unique CID-date combination.

FinalFID is that FID which has the particular finalrank value for a CID in a date.For eg, in the date 31/01/17, for CID abc0001 , the FID with maximum rank is rx180x03

So my result should look like this:

    Date         CID      FID        rank  finalrank finalFID
    31/01/17     abc0001  rx180x01    0    2         rx180x03
    31/01/17     abc0001  rx180x02    0    2         rx180x03
    31/01/17     abc0001  rx180x03    2    2         rx180x03
    28/02/17     abc0001  rx180x32    1    1         rx180x32
    28/02/17     abc0001  rx180x31    0    1         rx180x32

I have written a code which seems to be elegant and fine but it is not working for very large data. The one I'm dealing with has like 5,000,000. When I run it in R it is showing running an nothing else for such huge dataframes.

       data          = dplyr::group_by(data,CID,date)
       data          = arrange(data,CID,date)
       data          = dplyr::mutate(data, finalrank =max(rank))

       # Id FID of maximum rank
       data   = dplyr::mutate(data, match = FID[match(finalrank ,rank)])
like image 462
Dom Jo Avatar asked Jan 27 '23 19:01

Dom Jo


1 Answers

dat%>%
   group_by(Date,CID)%>%
   mutate(finalrank=max(rank),finalFID=FID[which.max(rank)])
# A tibble: 5 x 6
# Groups:   Date, CID [2]
  Date     CID     FID       rank finalrank finalFID
  <fct>    <fct>   <fct>    <int>     <dbl> <fct>   
1 31/01/17 abc0001 rx180x01     0         2 rx180x03
2 31/01/17 abc0001 rx180x02     0         2 rx180x03
3 31/01/17 abc0001 rx180x03     2         2 rx180x03
4 28/02/17 abc0001 rx180x32     1         1 rx180x32
5 28/02/17 abc0001 rx180x31     0         1 rx180x32

using data.table

library(data.table)
setDT(dat)[,c("finalrank","finalFID"):=.(max(rank),FID[which.max(rank)]),by=.(Date,CID)]
dat 
       Date     CID      FID rank finalrank finalFID
1: 31/01/17 abc0001 rx180x01    0         2 rx180x03
2: 31/01/17 abc0001 rx180x02    0         2 rx180x03
3: 31/01/17 abc0001 rx180x03    2         2 rx180x03
4: 28/02/17 abc0001 rx180x32    1         1 rx180x32
5: 28/02/17 abc0001 rx180x31    0         1 rx180x32
like image 127
KU99 Avatar answered Feb 07 '23 16:02

KU99