Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate sum of one column based on another column

I have a data frame:

Y  X1  X2  X3
1   1   0  1
1   0   1  1
0   1   0  1
0   0   0  1
1   1   1  0
0   1   1  0

I want sum over all rows in Y column based on other columns that equal to 1, which is sum(Y=1|Xi =1). For example, for column X1, s1 = sum(Y=1|Xi =1) =1 + 0 +1+0 =2

Y  X1   
1   1   

0   1    

1   1    
0   1   

For X2 column, the s2 = sum(Y=1|Xi =1) = 0 +1+0 =1

    Y   X2  

    0   1   

    1   1    
    0   1     

For X3 column, the s3 = sum(Y=1|Xi =1) = 1+1 +0+0 =2

    Y    X3
    1    1
    1    1
    0    1
    0    1

I have a rough idea to use apply(df, 2, sum) for the column of the dataframe, but I have no idea how to subset each column based on Xi, then calculate the sum of Y. Any help is appreciated!

like image 487
Jassy.W Avatar asked Mar 27 '17 21:03

Jassy.W


People also ask

How do I sum cells in one column based on another column?

(1) Select the column name that you will sum based on, and then click the Primary Key button; (2) Select the column name that you will sum, and then click the Calculate > Sum. (3) Click the Ok button.

How do you sum values based on two columns in Excel?

=SUMIFS(D2:D11, In other words, you want the formula to sum numbers in that column if they meet the conditions. That cell range is the first argument in this formula—the first piece of data that the function requires as input.

How can I obtain a value in one column based on another column in Excel?

Example using VLOOKUP You can check if the values in column A exist in column B using VLOOKUP. Select cell C2 by clicking on it. Insert the formula in “=IF(ISERROR(VLOOKUP(A2,$B$2:$B$1001,1,FALSE)),FALSE,TRUE)” the formula bar. Press Enter to assign the formula to C2.

How do you sum a range of cells based on multiple criteria?

To sum cells that match multiple criteria, you normally use the SUMIFS function. The problem is that, just like its single-criterion counterpart, SUMIFS doesn't support a multi-column sum range. To overcome this, we write a few SUMIFS, one per each column in the sum range: SUM(SUMIFS(…), SUMIFS(…), SUMIFS(…))


1 Answers

There are numerous ways to do this. One is getting a subset based on the column you want:

sum(df[df$X1==1,]$Y)

This should work for you.

like image 67
M-- Avatar answered Sep 30 '22 00:09

M--