Suppose I have 2 dataframes structured as such: GROUPS: <pre class="prettyprint"><code>P1 P2 P3 P4 123 213 312 231 345 123 213 567 </code></pre> INDIVIDUAL_RESULTS: <pre class="prettyprint"><code>ID SCORE 123 23 213 12 312 11 213 19 345 10 567 22 </code></pre> I want to add a column to the <code>GROUPS</code> which is a sum of each of their individual results: <pre class="prettyprint"><code>P1 P2 P3 P4 SCORE 123 213 312 231 65 </code></pre> I've tried using various <code>merge</code> techniques, but have really just created a mess. I feel like there's a simple solution I just don't know about, would really appreciate some guidance!

I would try a slow but could be an intuitive way for new users. I think the difficulty was created by the format of your data <code>d1</code>. If you do a little bit of tidy up: <pre class="prettyprint"><code>library(tidyverse) d1<-data.frame(t(d1)) colnames(d1) <-c("group1", "group2") d1$P = row.names(d1) d1<-d1 %>% pivot_longer( cols = group1:group2, names_to = "Group", values_to = "ID" ) df <-left_join(d1, d2, by ="ID") df # A tibble: 8 x 4 P Group ID SCORE <chr> <chr> <int> <int> 1 P1 group1 123 23 2 P1 group2 345 10 3 P2 group1 213 12 4 P2 group2 123 23 5 P3 group1 312 11 6 P3 group2 213 12 7 P4 group1 231 19 8 P4 group2 567 22 </code></pre> Once you get the data to this more "conventional" format, we can easily work out a <code>tidyverse</code> solution. <pre class="prettyprint"><code>df %>% group_by(Group) %>% summarize(SCORE = sum(SCORE)) # A tibble: 2 x 2 Group SCORE <chr> <int> 1 group1 65 2 group2 67 </code></pre>

How to lookup and sum multiple columns in R

Tags:

r

dplyr

Suppose I have 2 dataframes structured as such:

GROUPS:

P1      P2      P3      P4
123     213     312     231
345     123     213     567

INDIVIDUAL_RESULTS:

ID      SCORE
123     23
213     12
312     11
213     19
345     10
567     22

I want to add a column to the GROUPS which is a sum of each of their individual results:

P1      P2      P3      P4      SCORE
123     213     312     231     65

I've tried using various merge techniques, but have really just created a mess. I feel like there's a simple solution I just don't know about, would really appreciate some guidance!

267

asked Oct 18 '19 13:10

FloatingFish

2 Answers

d1=read.table(text="
P1      P2      P3      P4
123     213     312     231
345     123     213     567",h=T)

d2=read.table(text="
ID      SCORE
123     23
213     12
312     11
231     19
345     10
567     22",h=T)

I will be using the apply and match functions. Apply will apply the match function to each row of d1, match will find the matching values from the row of d1 and d2$ID (their indices) and then take the values in d2$SCORE at those indices. In the end we sum them up.

d1$SCORE=apply(d1,1,function(x){
  sum(d2$SCORE[match(x,d2$ID)])
})

and the result

   P1  P2  P3  P4 SCORE
1 123 213 312 231    65
2 345 123 213 567    67

answered Oct 19 '22 04:10

user2974951

I would try a slow but could be an intuitive way for new users. I think the difficulty was created by the format of your data d1. If you do a little bit of tidy up:

library(tidyverse)
d1<-data.frame(t(d1))
colnames(d1) <-c("group1", "group2")
d1$P = row.names(d1)
d1<-d1 %>% 
  pivot_longer(
    cols = group1:group2, 
    names_to = "Group",
    values_to = "ID"
  )  

df <-left_join(d1, d2, by ="ID")
df

# A tibble: 8 x 4
  P     Group     ID SCORE
  <chr> <chr>  <int> <int>
1 P1    group1   123    23
2 P1    group2   345    10
3 P2    group1   213    12
4 P2    group2   123    23
5 P3    group1   312    11
6 P3    group2   213    12
7 P4    group1   231    19
8 P4    group2   567    22

Once you get the data to this more "conventional" format, we can easily work out a tidyverse solution.

df  %>% 
  group_by(Group) %>% 
  summarize(SCORE = sum(SCORE))
# A tibble: 2 x 2
  Group  SCORE
  <chr>  <int>
1 group1    65
2 group2    67

answered Oct 19 '22 05:10

Zhiqiang Wang

Related questions
                            
                                Make Y-axis start at 1 instead of 0 within ggplot bar chart
                            
                                Is there a way to make a kable without lines/borders for pdf?
                            
                                Icons in data table in Shiny
                            
                                join data frames and replace one column with another
                            
                                How to fix an error when adding a manual scale in ggplot?
                            
                                How to change alpha in geom_sf?
                            
                                In R: How to replace NA in a Vector found between two integers
                            
                                autoplot does not accept ts object
                            
                                How to stop ggrepel labels moving between gganimate frames in R/ggplot2?
                            
                                Mutate_if or mutate_at in dplyr with Dates
                            
                                How to generate README.md from README.Rmd for R package?
                            
                                "recursive" self join in data.table
                            
                                How to solve an equation for a given variable in R?
                            
                                How to do faster list-column operations inside data.table
                            
                                str_extract_all: return all patterns found in string concatenated as vector
                            
                                How to name a list of a group_split in dplyr when grouped by more than one column
                            
                                Combining all data in a data frame per column and groups in R
                            
                                finding multiples close to a value in r
                            
                                R dplyr choose value from column with column name to choose in a separate column
                            
                                How does R check for system external dependencies when installing an R package?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With