Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: filtering out columns based on total sum and average

Tags:

pandas

I have a dataframe of timeseries data with numerical data in columns. When plotting this data, I only want to plot certain series that are considered to be insufficient. Here's how I select which columns to plot:

df.loc[:,  (df.iloc[-1] >= 100) & (df.sum() >= 1000)]

In other words, the criteria for "significant" is that the total sum of values in a series in over 1000 and the most recent value is at least 100.

This however turned out to be insufficient. What I need instead is that the sum is over 1000 (as before), but I want the average of the last two rows (the two most recent readings) to be over 100.

How do I change the filter above to compute the average?

In:

date           A    B    C   D
2016-04-01    80  235   99   0
2016-04-02    85  295  153  14
2016-04-03   111  363  224  14
2016-04-04   111  379  296  50
2016-04-05    11   51   29   5

Out:

date           B    C
2016-04-01   235   99
2016-04-02   295  153
2016-04-03   363  224
2016-04-04   379  296
2016-04-05    51   29
like image 243
Dmitry B. Avatar asked May 26 '16 06:05

Dmitry B.


1 Answers

You just need to change the slice (df.iloc[-2:]) and call .mean():

df.loc[:, (df.sum() >= 1000) & (df.iloc[-2:].mean() >= 100)]

(There seems to be a mistake in your example. Input and output are different for the last row.)

like image 192
ayhan Avatar answered Oct 10 '22 01:10

ayhan