Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: calculate the quotient of 2 columns by ID

Tags:

r

percentage

Here is my data:

ID      nb     ecart    
ID1     3       NA  
ID1     3       0    
ID1     3       1.5 
ID2     2       NA  
ID2     2       648 
ID3     4       NA 
ID3     4       0  
ID3     4       0 
ID3     4       7 

I want to calculate the percentage of number of ecart which=="0" for each ID.

nb is the variable which shows the number of rows for each ID.

The exceped result:

ID      nb     ecart    percentage
ID1     3       NA        NA
ID1     3       0        1/3
ID1     3       1.5       NA
ID2     2       NA        NA
ID2     2       648       NA
ID3     4       NA        NA
ID3     4       0        2/4
ID3     4       0        2/4
ID3     4       7         NA

Hope to get your answer soon! Thanks!

like image 887
velvetrock Avatar asked Jul 16 '15 13:07

velvetrock


2 Answers

A quick and efficient data.table solution

library(data.table)
setDT(df)[ecart == 0L, percentage := round(.N / nb, 2L), by = ID]
#     ID nb ecart percentage
# 1: ID1  3    NA         NA
# 2: ID1  3   0.0       0.33
# 3: ID1  3   1.5         NA
# 4: ID2  2    NA         NA
# 5: ID2  2 648.0         NA
# 6: ID3  4    NA         NA
# 7: ID3  4   0.0       0.50
# 8: ID3  4   0.0       0.50
# 9: ID3  4   7.0         NA

How this works: This will modify the values of percentage by reference only when ecart == 0L by calculating the size of the subgroup (using .N) divided by nb


Or (as commented by @CathG), if you want a pretty character print instead of a numerical value, you can do

setDT(df)[ecart == 0L, percentage := paste0(.N, "/", nb), by = ID]

Or if you prefer to use a binary join

setkey(setDT(df), ecart)[.(0L), percentage := paste0(.N, "/", nb), by = ID]
like image 111
David Arenburg Avatar answered Oct 04 '22 21:10

David Arenburg


Here's a dplyr answer.

library(dplyr)

data %>%
  group_by(ID) %>% 
  mutate(percentage =
         ifelse(is.na(ecart),
                NA,
                sum(ecart == 0, na.rm = TRUE)/n()))
like image 37
Mhairi McNeill Avatar answered Oct 04 '22 21:10

Mhairi McNeill