Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python count dataframe column values meeting condition

What would be a more elegant way to writing:

df[df['income'] > 0].count()['income']

I would like to simply count the number of column values meeting a condition (in this example, the condition is just being larger than zero, but I would like a way applicable to any arbitrary condition or set of conditions). Obviously more elegant if the column name would not need to show up twice in the expression. Should be hopefully easy.

like image 354
matanster Avatar asked Oct 22 '25 23:10

matanster


1 Answers

df = pd.DataFrame([0, 30000, 75000, -300, 23000], columns=['income'])
print(df)
   income
0       0
1   30000
2   75000
3    -300
4   23000

If you would like to count values in a column meeting a slightly more complex condition than just being positive, for example "value is in the range from 5000 to 25000", you can use two methods.

First, using boolean indexing,

((df['income'] > 5000) & (df['income'] < 25000)).sum()

Second, applying a function on every row of the series,

df['income'].map(lambda x: 5000 < x < 25000).sum()

Note that the second approach allows arbitrarily complex condition but is much slower than the first approach which is using low-level operations on the underlying arrays. See the documentation on boolean indexing for more information.

like image 170
fokkerplanck Avatar answered Oct 25 '25 13:10

fokkerplanck