Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr in order to create a new column

Tags:

r

dplyr

Hel lo I have a df such as

Groups COL1 COL2
G1 1 A
G1 1 C
G1 2 A
G1 2 B
G1 5 C
G1 6 C
G2 7 B
G2 7 B
G2 8 C
G3 10 C
G3 10 A
G3 11 B
G4 12 C
G4 12 C

and the idea is to add a new column COL3 and

group_by(Groups, COL1) %>%
  mutate(COL3 = COL1(A>B>C))

the idea being that within groups and COL1, if two COL2 values are different, if A is present with A or C, all value become A, if A is not present but B is here, all value become B and if there is only C, all value become C (they already are).

so A > B and B > C here is the expected output :

Groups COL1 COL2 COL3
G1 1 A A
G1 1 C A 
G1 2 A A
G1 2 B A
G1 5 C C
G1 6 C C
G2 7 B B
G2 7 B B
G2 8 C C
G3 10 C A
G3 10 A A
G3 11 B B
G4 12 C C
G4 12 C C

Does someone have an idea ?

like image 688
chippycentra Avatar asked Dec 04 '22 17:12

chippycentra


2 Answers

If COL2 can be meaningfully sorted, min() should work:

df <- structure(
    list(Groups = c("G1", "G1", "G1", "G1", "G1", "G1", "G2", "G2", "G2", "G3",
                    "G3", "G3", "G4", "G4"),
         COL1 = c(1L, 1L, 2L, 2L, 5L, 6L, 7L, 7L, 8L, 10L, 10L, 11L, 12L, 12L),
         COL2 = c("A", "C", "A", "B", "C", "C", "B", "B", "C", "C", "A", "B",
                  "C", "C")),
    class = "data.frame", row.names = c(NA, -14L))

library("dplyr")


df %>% 
    group_by(Groups, COL1) %>% 
    mutate(COL3 = min(COL2))
#> # A tibble: 14 x 4
#> # Groups:   Groups, COL1 [9]
#>    Groups  COL1 COL2  COL3 
#>    <chr>  <int> <chr> <chr>
#>  1 G1         1 A     A    
#>  2 G1         1 C     A    
#>  3 G1         2 A     A    
#>  4 G1         2 B     A    
#>  5 G1         5 C     C    
#>  6 G1         6 C     C    
#>  7 G2         7 B     B    
#>  8 G2         7 B     B    
#>  9 G2         8 C     C    
#> 10 G3        10 C     A    
#> 11 G3        10 A     A    
#> 12 G3        11 B     B    
#> 13 G4        12 C     C    
#> 14 G4        12 C     C

Created on 2020-05-28 by the reprex package (v0.3.0)

like image 105
hplieninger Avatar answered Dec 11 '22 17:12

hplieninger


I think this gives the expected result:

df %>% 
  group_by(Groups, COL1) %>% 
  mutate(COL2 = levels(COL2)[min(as.numeric(COL2))])
#> # Groups:   Groups, COL1 [9]
#>    Groups  COL1 COL2 
#>    <fct>  <int> <chr>
#>  1 G1         1 A    
#>  2 G1         1 A    
#>  3 G1         2 A    
#>  4 G1         2 A    
#>  5 G1         5 C    
#>  6 G1         6 C    
#>  7 G2         7 B    
#>  8 G2         7 B    
#>  9 G2         8 C    
#> 10 G3        10 A    
#> 11 G3        10 A    
#> 12 G3        11 B    
#> 13 G4        12 C    
#> 14 G4        12 C 

like image 36
Allan Cameron Avatar answered Dec 11 '22 15:12

Allan Cameron