I have a data frame like this:
df
A B
[1,] 1 4
[2,] 2 3
[3,] 3 5
[4,] 4 7
[5,] 5 5
[6,] 6 7
[7,] 7 4
[8,] 8 7
[9,] 9 3
[10,] 10 7
I need to sum the numbers in column B that fall between a certain interval defined by column A. For example sum the value in B between A≥1 and A<3.
I'm trying to use this:
> sum(which(df$B[df$A>=1] & df$B[df$A<3]))
but it does not give me what I want.
You're almost there.
with(my.df, sum(my.df[A >= 1 & A < 3, "B"]))
EDIT
Chase challenged me to explain away the code at hand. When reading R code, it's best to read from inwards out. Here we go.
my.df
is a data.frame (think Excel sheet) with columns A
and B
. Square brackets []
are used to subset anything from this object like so: [rows, columns]
. For example, [1, ]
would return the entire first row, and if you add a column number (or column name), you get value in first row in that column (e.g. [1, 2]
, where you would get value in the first row of the second column).
We will now subset rows in my.df
with A >= 1 & A < 3
. What we're saying here is we want to see those rows that have values in A
bigger or equal to 1 and smaller than 3. This will give us all rows that satisfy this criterion. If we add , "B"
it means we want to select column B
. Since we already subsetted the result to contain only rows that fit the above criterion, by entering column name B
we get values only in the column. Once you have those values from column B
you sum them using sum()
.
Use of with
function is there to make our life easier. If we hadn't used that, we would be forced to call columns by their full name my.df$A
and my.df$B
.
Another option using the select parameter from subset achieves identical results. : sum(subset(dat, A < 3 & A >= 1, select = "B"))
Roman's answer is faster, but arguably not as easy to deparse to the non R-expert. Depending on your audience (i.e. sharing script with colleagues) and performance constraints, one may be preferred to the other.
> system.time(replicate(100,with(dat, sum(dat[A >= 1 & A < 3, "B"]))))
user system elapsed
0.008 0.000 0.008
> system.time(replicate(100, sum(subset(dat, A < 3 & A >= 1, select = "B"))))
user system elapsed
0.041 0.000 0.041
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With