Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to summarize values depending on category of other variable in R?

Tags:

dataframe

r

I have a dataset which shows the religious adherence of Party A and Party B in X country, in addition to the percentage of religious adherents in each country.

df <- data.frame(
  PartyA = c("Christian","Muslim","Muslim","Jewish","Sikh"),
  PartyB = c("Jewish","Muslim","Christian","Muslim","Buddhist"),
  ChristianPop = c(12,1,74,14,17),
  MuslimPop = c(71,93,5,86,13),
  JewishPop = c(9,2,12,0,4),
  SikhPop = c(0,0,1,0,10),
  BuddhistPop = c(1,0,2,0,45)
)
#      PartyA    PartyB ChristianPop MuslimPop JewishPop SikhPop BuddhistPop
# 1 Christian    Jewish           12        71         9       0           1
# 2    Muslim    Muslim            1        93         2       0           0
# 3    Muslim Christian           74         5        12       1           2
# 4    Jewish    Muslim           14        86         0       0           0
# 5      Sikh  Buddhist           17        13         4      10          45

With this, I want to add together the total sum of "involved" religious adherents. So row one would get a variable equal to 12 + 9, row two only 93 (no addition since Party A and Party B are the same), etc.

#      PartyA    PartyB ChristianPop MuslimPop JewishPop SikhPop BuddhistPop PartyRel
# 1 Christian    Jewish           12        71         9       0           1       21
# 2    Muslim    Muslim            1        93         2       0           0       93
# 3    Muslim Christian           74         5        12       1           2       79
# 4    Jewish    Muslim           14        86         0       0           0       86
# 5      Sikh  Buddhist           17        13         4      10          45       55

I'm having a hard time even finding where to begin, and help would be much appreciated.

like image 730
Dobbleri Avatar asked Oct 19 '25 01:10

Dobbleri


1 Answers

We can iterate through rows with sapply, then paste the string "Pop" to your Party columns for indexing and summation.

df$PartyRel <- sapply(
  1:nrow(df), 
  \(x) ifelse(df[x, 1] == df[x, 2], 
              df[x, paste0(df[x, 1], "Pop")], 
              df[x, paste0(df[x, 1], "Pop")] + df[x, paste0(df[x, 2], "Pop")])
  )

Similar idea to my above base R solution, but this employs map2 from the purrr package in tidyverse style.

library(tidyverse)

df %>% 
  rowwise() %>% 
  mutate(PartyRel = map2_int(PartyA, PartyB,
                             ~ifelse(.x == .y, 
                                     get(paste0(.x, "Pop")), 
                                     get(paste0(.x, "Pop")) + get(paste0(.y, "Pop"))))) %>% 
  ungroup()

Output

Both of the above give the same result:

df
     PartyA    PartyB ChristianPop MuslimPop JewishPop SikhPop BuddhistPop PartyRel
1 Christian    Jewish           12        71         9       0           1       21
2    Muslim    Muslim            1        93         2       0           0       93
3    Muslim Christian           74         5        12       1           2       79
4    Jewish    Muslim           14        86         0       0           0       86
5      Sikh  Buddhist           17        13         4      10          45       55
like image 171
benson23 Avatar answered Oct 20 '25 16:10

benson23



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!