Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I sum values in a column that match a given condition using pandas?

Suppose I have a column like so:

a   b   1   5    1   7 2   3 1   3 2   5 

I want to sum up the values for b where a = 1, for example. This would give me 5 + 7 + 3 = 15.

How do I do this in pandas?

like image 589
adijo Avatar asked Jan 30 '15 12:01

adijo


People also ask

How do I sum values based on criteria in another column in Excel?

If you want, you can apply the criteria to one range and sum the corresponding values in a different range. For example, the formula =SUMIF(B2:B5, "John", C2:C5) sums only the values in the range C2:C5, where the corresponding cells in the range B2:B5 equal "John."

How do you use between conditions in pandas?

Boolean Series in PandasThe between() function is used to get boolean Series equivalent to left <= series <= right. This function returns a boolean vector containing True wherever the corresponding Series element is between the boundary values left and right. NA values are treated as False.


1 Answers

The essential idea here is to select the data you want to sum, and then sum them. This selection of data can be done in several different ways, a few of which are shown below.

Boolean indexing

Arguably the most common way to select the values is to use Boolean indexing.

With this method, you find out where column 'a' is equal to 1 and then sum the corresponding rows of column 'b'. You can use loc to handle the indexing of rows and columns:

>>> df.loc[df['a'] == 1, 'b'].sum() 15 

The Boolean indexing can be extended to other columns. For example if df also contained a column 'c' and we wanted to sum the rows in 'b' where 'a' was 1 and 'c' was 2, we'd write:

df.loc[(df['a'] == 1) & (df['c'] == 2), 'b'].sum() 

Query

Another way to select the data is to use query to filter the rows you're interested in, select column 'b' and then sum:

>>> df.query("a == 1")['b'].sum() 15 

Again, the method can be extended to make more complicated selections of the data:

df.query("a == 1 and c == 2")['b'].sum() 

Note this is a little more concise than the Boolean indexing approach.

Groupby

The alternative approach is to use groupby to split the DataFrame into parts according to the value in column 'a'. You can then sum each part and pull out the value that the 1s added up to:

>>> df.groupby('a')['b'].sum()[1] 15 

This approach is likely to be slower than using Boolean indexing, but it is useful if you want check the sums for other values in column a:

>>> df.groupby('a')['b'].sum() a 1    15 2     8 
like image 122
Alex Riley Avatar answered Oct 16 '22 02:10

Alex Riley