Featuretools supports already handling of multiple cutoff times https://docs.featuretools.com/automated_feature_engineering/handling_time.html
In [20]: temporal_cutoffs = ft.make_temporal_cutoffs(cutoffs['customer_id'],
....: cutoffs['cutoff_time'],
....: window_size='3d',
....: num_windows=2)
....:
In [21]: temporal_cutoffs
Out[21]:
time instance_id
0 2011-12-12 13458
1 2011-12-15 13458
2 2012-10-02 13602
3 2012-10-05 13602
4 2012-01-22 15222
5 2012-01-25 15222
In [22]: entityset = ft.demo.load_retail()
In [23]: feature_tensor, feature_defs = ft.dfs(entityset=entityset,
....: target_entity='customers',
....: cutoff_time=temporal_cutoffs,
....: cutoff_time_in_index=True,
....: max_features=4)
....:
In [24]: feature_tensor
Out[24]:
MAX(order_products.total) MIN(order_products.unit_price) STD(order_products.quantity) COUNT(order_products)
customer_id time
13458.0 2011-12-12 201.960 0.3135 10.053804 394
2011-12-15 201.960 0.3135 10.053804 394
15222.0 2012-01-22 272.250 1.1880 26.832816 5
2012-01-25 272.250 1.1880 26.832816 5
13602.0 2012-10-02 49.896 1.0395 8.732068 23
2012-10-05 49.896 1.0395 8.732068 23
But as you see for one ID multiple points in time a pandas multi index is generated. How (maybe via a pivot?) can I instead get all the MIN/MAX/... generated columns prefixed with last_x_days_MIN/MAX/... so get additional features per cutoff window?
initial feature 1,initial feature 2, time_frame_1_<AGGTYPE2>_Feature,time_frame_1_<AGGTYPE1>_Feature,time_frame_2_<AGGTYPE1>_Feature,time_frame_2_<AGGTYPE2>_Feature,time_frame_2_<AGGTYPE1>_Feature,time_frame_2_<AGGTYPE1>_Feature
You can achieve this by making two calls to ft.calculate_feature_matrix with different training_windows and joining the resulting feature matrices together. For example,
import featuretools as ft
import pandas as pd
entityset = ft.demo.load_retail()
cutoffs = pd.DataFrame({
'customer_name': ["Micheal Nicholson", "Krista Maddox"],
'cutoff_time': [pd.Timestamp('2011-10-14'), pd.Timestamp('2011-08-18')]
})
feature_defs = ft.dfs(entityset=entityset,
target_entity='customers',
agg_primitives=["sum"],
trans_primitives=[],
max_features=1,
features_only=True)
fm_60_days = ft.calculate_feature_matrix(entityset=entityset,
features=feature_defs,
cutoff_time=cutoffs,
training_window="60 days")
fm_30_days = ft.calculate_feature_matrix(entityset=entityset,
features=feature_defs,
cutoff_time=cutoffs,
training_window="30 days")
fm_60_days.merge(fm_30_days, left_index=True, right_index=True, suffixes=("__60_days", "__30_days"))
The code above returns this DataFrame where we have the same feature calculated using the last 60 and 30 days of data for calculation.
SUM(order_products.quantity)__60_days SUM(order_products.quantity)__30_days
customer_name
Krista Maddox 466 306
Micheal Nicholson 710 539
Note: this example runs on the latest release of Featuretools (v0.3.1) where we updated the demo retail dataset to have interpretable names as customer ids.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With