I have a data frame:
Y X1 X2 X3
1 1 0 1
1 0 1 1
0 1 0 1
0 0 0 1
1 1 1 0
0 1 1 0
I want sum over all rows in Y
column based on other columns that equal to 1
, which is sum(Y=1|Xi =1
). For example, for column X1
, s1 = sum(Y=1|Xi =1) =1 + 0 +1+0 =2
Y X1
1 1
0 1
1 1
0 1
For X2
column, the s2 = sum(Y=1|Xi =1) = 0 +1+0 =1
Y X2
0 1
1 1
0 1
For X3
column, the s3 = sum(Y=1|Xi =1) = 1+1 +0+0 =2
Y X3
1 1
1 1
0 1
0 1
I have a rough idea to use apply(df, 2, sum)
for the column of the dataframe, but I have no idea how to subset each column based on Xi
, then calculate the sum
of Y.
Any help is appreciated!
(1) Select the column name that you will sum based on, and then click the Primary Key button; (2) Select the column name that you will sum, and then click the Calculate > Sum. (3) Click the Ok button.
=SUMIFS(D2:D11, In other words, you want the formula to sum numbers in that column if they meet the conditions. That cell range is the first argument in this formula—the first piece of data that the function requires as input.
Example using VLOOKUP You can check if the values in column A exist in column B using VLOOKUP. Select cell C2 by clicking on it. Insert the formula in “=IF(ISERROR(VLOOKUP(A2,$B$2:$B$1001,1,FALSE)),FALSE,TRUE)” the formula bar. Press Enter to assign the formula to C2.
To sum cells that match multiple criteria, you normally use the SUMIFS function. The problem is that, just like its single-criterion counterpart, SUMIFS doesn't support a multi-column sum range. To overcome this, we write a few SUMIFS, one per each column in the sum range: SUM(SUMIFS(…), SUMIFS(…), SUMIFS(…))
There are numerous ways to do this. One is getting a subset based on the column you want:
sum(df[df$X1==1,]$Y)
This should work for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With