With pandas.DataFrame.resample I can downsample a DataFrame:
df.resample("3s", how="mean")
This resamples a data frame with a datetime-like index such that all values within 3 seconds are aggregated into one row. The values of the columns are averaged.
Question: I have a data frame with multiple columns. Is it possible to specify a different aggregation function for different columns, e.g. I want to "sum"
column x
, "mean"
column y
and pick the "last"
for column z
? How can I achieve that effect?
I know I could create a new empty data frame, and then call resample
three times, but I would prefer a faster in-place solution.
You can use .agg
after resample. With a dictionary, you can aggregate different columns with various functions.
Try this:
df.resample("3s").agg({'x':'sum','y':'mean','z':'last'})
Also, how
is deprecated:
C:\Program Files\Anaconda3\lib\site-packages\ipykernel__main__.py:1: FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).mean()
Consider the dataframe df
np.random.seed([3,1415])
tidx = pd.date_range('2017-01-01', periods=18, freq='S')
df = pd.DataFrame(np.random.rand(len(tidx), 3), tidx, list('XYZ'))
print(df)
X Y Z
2017-01-01 00:00:00 0.444939 0.407554 0.460148
2017-01-01 00:00:01 0.465239 0.462691 0.016545
2017-01-01 00:00:02 0.850445 0.817744 0.777962
2017-01-01 00:00:03 0.757983 0.934829 0.831104
2017-01-01 00:00:04 0.879891 0.926879 0.721535
2017-01-01 00:00:05 0.117642 0.145906 0.199844
2017-01-01 00:00:06 0.437564 0.100702 0.278735
2017-01-01 00:00:07 0.609862 0.085823 0.836997
2017-01-01 00:00:08 0.739635 0.866059 0.691271
2017-01-01 00:00:09 0.377185 0.225146 0.435280
2017-01-01 00:00:10 0.700900 0.700946 0.796487
2017-01-01 00:00:11 0.018688 0.700566 0.900749
2017-01-01 00:00:12 0.764869 0.253200 0.548054
2017-01-01 00:00:13 0.778883 0.651676 0.136097
2017-01-01 00:00:14 0.544838 0.035073 0.275079
2017-01-01 00:00:15 0.706685 0.713614 0.776050
2017-01-01 00:00:16 0.542329 0.836541 0.538186
2017-01-01 00:00:17 0.185523 0.652151 0.746060
Use agg
df.resample('3S').agg(dict(X='sum', Y='mean', Z='last'))
X Y Z
2017-01-01 00:00:00 1.760624 0.562663 0.777962
2017-01-01 00:00:03 1.755516 0.669204 0.199844
2017-01-01 00:00:06 1.787061 0.350861 0.691271
2017-01-01 00:00:09 1.096773 0.542220 0.900749
2017-01-01 00:00:12 2.088590 0.313316 0.275079
2017-01-01 00:00:15 1.434538 0.734102 0.746060
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With