I have a dataframe of timeseries data with numerical data in columns. When plotting this data, I only want to plot certain series that are considered to be insufficient. Here's how I select which columns to plot:
df.loc[:, (df.iloc[-1] >= 100) & (df.sum() >= 1000)]
In other words, the criteria for "significant" is that the total sum of values in a series in over 1000 and the most recent value is at least 100.
This however turned out to be insufficient. What I need instead is that the sum is over 1000 (as before), but I want the average of the last two rows (the two most recent readings) to be over 100.
How do I change the filter above to compute the average?
In:
date A B C D
2016-04-01 80 235 99 0
2016-04-02 85 295 153 14
2016-04-03 111 363 224 14
2016-04-04 111 379 296 50
2016-04-05 11 51 29 5
Out:
date B C
2016-04-01 235 99
2016-04-02 295 153
2016-04-03 363 224
2016-04-04 379 296
2016-04-05 51 29
You just need to change the slice (df.iloc[-2:]
) and call .mean()
:
df.loc[:, (df.sum() >= 1000) & (df.iloc[-2:].mean() >= 100)]
(There seems to be a mistake in your example. Input and output are different for the last row.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With