Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to lookup and sum multiple columns in R

Tags:

r

dplyr

Suppose I have 2 dataframes structured as such:

GROUPS:

P1      P2      P3      P4
123     213     312     231
345     123     213     567

INDIVIDUAL_RESULTS:

ID      SCORE
123     23
213     12
312     11
213     19
345     10
567     22

I want to add a column to the GROUPS which is a sum of each of their individual results:

P1      P2      P3      P4      SCORE
123     213     312     231     65

I've tried using various merge techniques, but have really just created a mess. I feel like there's a simple solution I just don't know about, would really appreciate some guidance!

like image 267
FloatingFish Avatar asked Oct 18 '19 13:10

FloatingFish


People also ask

How do I find the sum of specific columns in R?

Often you may want to find the sum of a specific set of columns in a data frame in R. Fortunately this is easy to do using the rowSums() function. This tutorial shows several examples of how to use this function in practice. Example 1: Find the Sum of Specific Columns

How do you find the sum of each row in Excel?

The rowSums () method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame. Syntax: rowSums (.) mutate(sum_of_rows = rowSums(.))

How to sum two or multiple Dataframe columns in R?

In this article, we will discuss how to perform some of two and multiple dataframes columns in R programming language. The columns whose sum has to be calculated can be called through the $ operator and then we can perform the sum of two dataframe columns by using “+” operator.

What is the sum of values in the second row?

The sum of values in the second row across all three columns is 12. And so on. You can find more R tutorials here.


2 Answers

d1=read.table(text="
P1      P2      P3      P4
123     213     312     231
345     123     213     567",h=T)

d2=read.table(text="
ID      SCORE
123     23
213     12
312     11
231     19
345     10
567     22",h=T)

I will be using the apply and match functions. Apply will apply the match function to each row of d1, match will find the matching values from the row of d1 and d2$ID (their indices) and then take the values in d2$SCORE at those indices. In the end we sum them up.

d1$SCORE=apply(d1,1,function(x){
  sum(d2$SCORE[match(x,d2$ID)])
})

and the result

   P1  P2  P3  P4 SCORE
1 123 213 312 231    65
2 345 123 213 567    67
like image 59
user2974951 Avatar answered Oct 19 '22 04:10

user2974951


I would try a slow but could be an intuitive way for new users. I think the difficulty was created by the format of your data d1. If you do a little bit of tidy up:

library(tidyverse)
d1<-data.frame(t(d1))
colnames(d1) <-c("group1", "group2")
d1$P = row.names(d1)
d1<-d1 %>% 
  pivot_longer(
    cols = group1:group2, 
    names_to = "Group",
    values_to = "ID"
  )  

df <-left_join(d1, d2, by ="ID")
df

# A tibble: 8 x 4
  P     Group     ID SCORE
  <chr> <chr>  <int> <int>
1 P1    group1   123    23
2 P1    group2   345    10
3 P2    group1   213    12
4 P2    group2   123    23
5 P3    group1   312    11
6 P3    group2   213    12
7 P4    group1   231    19
8 P4    group2   567    22

Once you get the data to this more "conventional" format, we can easily work out a tidyverse solution.

df  %>% 
  group_by(Group) %>% 
  summarize(SCORE = sum(SCORE))
# A tibble: 2 x 2
  Group  SCORE
  <chr>  <int>
1 group1    65
2 group2    67
like image 31
Zhiqiang Wang Avatar answered Oct 19 '22 05:10

Zhiqiang Wang