Calculating same features with multiple training windows in Featuretools

Question

Featuretools supports already handling of multiple cutoff times https://docs.featuretools.com/automated_feature_engineering/handling_time.html

In [20]: temporal_cutoffs = ft.make_temporal_cutoffs(cutoffs['customer_id'],
   ....:                                             cutoffs['cutoff_time'],
   ....:                                             window_size='3d',
   ....:                                             num_windows=2)
   ....: 

In [21]: temporal_cutoffs
Out[21]: 
        time  instance_id
0 2011-12-12        13458
1 2011-12-15        13458
2 2012-10-02        13602
3 2012-10-05        13602
4 2012-01-22        15222
5 2012-01-25        15222

In [22]: entityset = ft.demo.load_retail()

In [23]: feature_tensor, feature_defs = ft.dfs(entityset=entityset,
   ....:                                       target_entity='customers',
   ....:                                       cutoff_time=temporal_cutoffs,
   ....:                                       cutoff_time_in_index=True,
   ....:                                       max_features=4)
   ....: 

In [24]: feature_tensor
Out[24]: 
                        MAX(order_products.total)  MIN(order_products.unit_price)  STD(order_products.quantity)  COUNT(order_products)
customer_id time                                                                                                                      
13458.0     2011-12-12                    201.960                          0.3135                     10.053804                    394
            2011-12-15                    201.960                          0.3135                     10.053804                    394
15222.0     2012-01-22                    272.250                          1.1880                     26.832816                      5
            2012-01-25                    272.250                          1.1880                     26.832816                      5
13602.0     2012-10-02                     49.896                          1.0395                      8.732068                     23
            2012-10-05                     49.896                          1.0395                      8.732068                     23

But as you see for one ID multiple points in time a pandas multi index is generated. How (maybe via a pivot?) can I instead get all the MIN/MAX/... generated columns prefixed with last_x_days_MIN/MAX/... so get additional features per cutoff window?

edit desired output format

initial feature 1,initial feature 2, time_frame_1_<AGGTYPE2>_Feature,time_frame_1_<AGGTYPE1>_Feature,time_frame_2_<AGGTYPE1>_Feature,time_frame_2_<AGGTYPE2>_Feature,time_frame_2_<AGGTYPE1>_Feature,time_frame_2_<AGGTYPE1>_Feature

Max Kanter · Accepted Answer

You can achieve this by making two calls to ft.calculate_feature_matrix with different training_windows and joining the resulting feature matrices together. For example,

import featuretools as ft
import pandas as pd

entityset = ft.demo.load_retail()

cutoffs = pd.DataFrame({
      'customer_name': ["Micheal Nicholson", "Krista Maddox"],
      'cutoff_time': [pd.Timestamp('2011-10-14'), pd.Timestamp('2011-08-18')]
    })

feature_defs = ft.dfs(entityset=entityset,
                      target_entity='customers',
                      agg_primitives=["sum"],
                      trans_primitives=[],
                      max_features=1,
                      features_only=True)



fm_60_days = ft.calculate_feature_matrix(entityset=entityset,
                                         features=feature_defs,
                                         cutoff_time=cutoffs,
                                         training_window="60 days")

fm_30_days = ft.calculate_feature_matrix(entityset=entityset,
                                         features=feature_defs,
                                         cutoff_time=cutoffs,
                                         training_window="30 days")

fm_60_days.merge(fm_30_days, left_index=True, right_index=True, suffixes=("__60_days", "__30_days"))

The code above returns this DataFrame where we have the same feature calculated using the last 60 and 30 days of data for calculation.

                  SUM(order_products.quantity)__60_days  SUM(order_products.quantity)__30_days
customer_name                                                                                  
Krista Maddox                                        466                                    306
Micheal Nicholson                                    710                                    539

Note: this example runs on the latest release of Featuretools (v0.3.1) where we updated the demo retail dataset to have interpretable names as customer ids.

Calculating same features with multiple training windows in Featuretools

Tags:

python

pandas

feature-extraction

featuretools

feature-engineering

edit desired output format

Georg Heiler

1 Answers

Max Kanter

Recent Activity

Donate For Us

Calculating same features with multiple training windows in Featuretools

Tags:

python

pandas

feature-extraction

featuretools

feature-engineering

edit desired output format

Georg Heiler

1 Answers

Max Kanter

Related questions

Recent Activity

Donate For Us