Here is an example:
# Generate some random time series dataframe with 'price' and 'volume'
x = pd.date_range('2017-01-01', periods=100, freq='1min')
df_x = pd.DataFrame({'price': np.random.randint(50, 100, size=x.shape), 'vol': np.random.randint(1000, 2000, size=x.shape)}, index=x)
df_x.head(10)
price vol
2017-01-01 00:00:00 56 1544
2017-01-01 00:01:00 70 1680
2017-01-01 00:02:00 92 1853
2017-01-01 00:03:00 94 1039
2017-01-01 00:04:00 81 1180
2017-01-01 00:05:00 70 1443
2017-01-01 00:06:00 56 1621
2017-01-01 00:07:00 68 1093
2017-01-01 00:08:00 59 1684
2017-01-01 00:09:00 86 1591
# Here is some example aggregate function:
df_x.resample('5Min').agg({'price': 'mean', 'vol': 'sum'}).head()
price vol
2017-01-01 00:00:00 78.6 7296
2017-01-01 00:05:00 67.8 7432
2017-01-01 00:10:00 76.0 9017
2017-01-01 00:15:00 74.0 6989
2017-01-01 00:20:00 64.4 8078
However, if I want to extract other aggregated info depends on more than one column, what can I do?
For example, I want to append 2 more columns here, called all_up
and all_down
.
These 2 columns' calculations are defined as follows:
In every 5 minutes, how many times the 1-minute sampled price went down and vol went down, call this column all_down
, and how many times they are went up, call this column all_up
.
Here is what I expect the 2 columns look like:
price vol all_up all_down
2017-01-01 00:00:00 78.6 7296 2 0
2017-01-01 00:05:00 67.8 7432 0 0
2017-01-01 00:10:00 76.0 9017 1 0
2017-01-01 00:15:00 74.0 6989 1 1
2017-01-01 00:20:00 64.4 8078 0 2
This functionality depends on 2 columns. But in the agg
function in the Resampler
object, it seems that it only accept 3 kinds of functions:
str
or a function that applies to each of the columns separately.list
of functions that applies to each of the columns separately.dict
with keys matches the column names. Still only apply the value which is a function to a single column each time.All these functionalities seem doesn't meet my needs.
I think you need instead resample
use groupby
+ Grouper
and apply
with custom function:
def func(x):
#code
a = x['price'].mean()
#custom function working with 2 columns
b = (x['price'] / x['vol']).mean()
return pd.Series([a,b], index=['col1','col2'])
df_x.groupby(pd.Grouper(freq='5Min')).apply(func)
Or use resample
for all supported aggreagate functions and join outputs together with outputs of custom function:
def func(x):
#custom function
b = (x['price'] / x['vol']).mean()
return b
df1 = df_x.groupby(pd.Grouper(freq='5Min')).apply(func)
df2 = df_x.resample('5Min').agg({'price': 'mean', 'vol': 'sum'}).head()
df = pd.concat([df1, df2], axis=1)
EDIT: For check decreasing and increasing is used function diff
and compare with 0
, join both condition with &
and count by sum
:
def func(x):
v = x['vol'].diff().fillna(0)
p = x['price'].diff().fillna(0)
m1 = (v > 0) & (p > 0)
m2 = (v < 0) & (p < 0)
return pd.Series([m1.sum(), m2.sum()], index=['all_up','all_down'])
df1 = df_x.groupby(pd.Grouper(freq='5min')).apply(func)
print (df1)
all_up all_down
2017-01-01 00:00:00 2 0
2017-01-01 00:05:00 0 0
df2 = df_x.resample('5Min').agg({'price': 'mean', 'vol': 'sum'}).head()
df = pd.concat([df2, df1], axis=1)
print (df)
vol price all_up all_down
2017-01-01 00:00:00 7296 78.6 2 0
2017-01-01 00:05:00 7432 67.8 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With