Suppose I have 2 dataframes structured as such:
GROUPS:
P1 P2 P3 P4
123 213 312 231
345 123 213 567
INDIVIDUAL_RESULTS:
ID SCORE
123 23
213 12
312 11
213 19
345 10
567 22
I want to add a column to the GROUPS
which is a sum of each of their individual results:
P1 P2 P3 P4 SCORE
123 213 312 231 65
I've tried using various merge
techniques, but have really just created a mess. I feel like there's a simple solution I just don't know about, would really appreciate some guidance!
Often you may want to find the sum of a specific set of columns in a data frame in R. Fortunately this is easy to do using the rowSums() function. This tutorial shows several examples of how to use this function in practice. Example 1: Find the Sum of Specific Columns
The rowSums () method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame. Syntax: rowSums (.) mutate(sum_of_rows = rowSums(.))
In this article, we will discuss how to perform some of two and multiple dataframes columns in R programming language. The columns whose sum has to be calculated can be called through the $ operator and then we can perform the sum of two dataframe columns by using “+” operator.
The sum of values in the second row across all three columns is 12. And so on. You can find more R tutorials here.
d1=read.table(text="
P1 P2 P3 P4
123 213 312 231
345 123 213 567",h=T)
d2=read.table(text="
ID SCORE
123 23
213 12
312 11
231 19
345 10
567 22",h=T)
I will be using the apply
and match
functions. Apply will apply the match function to each row of d1, match will find the matching values from the row of d1 and d2$ID (their indices) and then take the values in d2$SCORE at those indices. In the end we sum them up.
d1$SCORE=apply(d1,1,function(x){
sum(d2$SCORE[match(x,d2$ID)])
})
and the result
P1 P2 P3 P4 SCORE
1 123 213 312 231 65
2 345 123 213 567 67
I would try a slow but could be an intuitive way for new users. I think the difficulty was created by the format of your data d1
. If you do a little bit of tidy up:
library(tidyverse)
d1<-data.frame(t(d1))
colnames(d1) <-c("group1", "group2")
d1$P = row.names(d1)
d1<-d1 %>%
pivot_longer(
cols = group1:group2,
names_to = "Group",
values_to = "ID"
)
df <-left_join(d1, d2, by ="ID")
df
# A tibble: 8 x 4
P Group ID SCORE
<chr> <chr> <int> <int>
1 P1 group1 123 23
2 P1 group2 345 10
3 P2 group1 213 12
4 P2 group2 123 23
5 P3 group1 312 11
6 P3 group2 213 12
7 P4 group1 231 19
8 P4 group2 567 22
Once you get the data to this more "conventional" format, we can easily work out a tidyverse
solution.
df %>%
group_by(Group) %>%
summarize(SCORE = sum(SCORE))
# A tibble: 2 x 2
Group SCORE
<chr> <int>
1 group1 65
2 group2 67
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With