How to apply Deep Feature Synthesis to a single table

Question

After processing, my data is one table with several columns that are features and one column which is a label. I would like to use featuretools.dfs to help me predict the label. Is it possible to do it directly, or do I need to split my single table into multiple?

Max Kanter · Accepted Answer

It is possible to run DFS on a single table. As an example, if you have a pandas dataframe df with index 'index', you would write:

import featuretools as ft
es = ft.EntitySet('Transactions')

es.entity_from_dataframe(dataframe=df,
                         entity_id='log',
                         index='index')

fm, features = ft.dfs(entityset=es, 
                      target_entity='log',
                      trans_primitives=['day', 'weekday', 'month'])

The generated feature matrix will look like

In [1]: fm
Out[1]: 
             location  pies sold  WEEKDAY(date)  MONTH(date)  DAY(date)
index                                                                  
1         main street          3              4           12         29
2         main street          4              5           12         30
3         main street          5              6           12         31
4      arlington ave.         18              0            1          1
5      arlington ave.          1              1            1          2

This will apply “transform” primitives to your data. You usually want to add more entities to give ft.dfs, in order to use aggregation primitives. You can read about the difference in our documentation.

A standard workflow is to normalize your single entity by an interesting categorical. If your df was the single table

| index | location       | pies sold |   date |
|-------+----------------+-------+------------|
|     1 | main street    |     3 | 2017-12-29 |
|     2 | main street    |     4 | 2017-12-30 |
|     3 | main street    |     5 | 2017-12-31 |
|     4 | arlington ave. |    18 | 2018-01-01 |
|     5 | arlington ave. |     1 | 2018-01-02 |

you would probably be interested in normalizing by location:

es.normalize_entity(base_entity_id='log',
                    new_entity_id='locations',
                    index='location')

Your new entity locations would have the table

| location       | first_log_time |
|----------------+----------------|
| main street    |     2018-12-29 |
| arlington ave. |     2000-01-01 |

which would make features like locations.SUM(log.pies sold) or locations.MEAN(log.pies sold) to add or average all values by location. You can see these features created in the example below

In [1]: import pandas as pd
   ...: import featuretools as ft
   ...: df = pd.DataFrame({'index': [1, 2, 3, 4, 5],
   ...:                    'location': ['main street',
   ...:                                 'main street',
   ...:                                 'main street',
   ...:                                 'arlington ave.',
   ...:                                 'arlington ave.'],
   ...:                    'pies sold': [3, 4, 5, 18, 1]})
   ...: df['date'] = pd.date_range('12/29/2017', periods=5, freq='D')
   ...: df
   ...: 

Out[1]: 
   index        location  pies sold       date
0      1     main street          3 2017-12-29
1      2     main street          4 2017-12-30
2      3     main street          5 2017-12-31
3      4  arlington ave.         18 2018-01-01
4      5  arlington ave.          1 2018-01-02

In [2]: es = ft.EntitySet('Transactions')
   ...: es.entity_from_dataframe(dataframe=df, entity_id='log', index='index', t
   ...: ime_index='date')
   ...: es.normalize_entity(base_entity_id='log', new_entity_id='locations', ind
   ...: ex='location')
   ...: 
Out[2]: 
Entityset: Transactions
  Entities:
    log [Rows: 5, Columns: 4]
    locations [Rows: 2, Columns: 2]
  Relationships:
    log.location -> locations.location

In [3]: fm, features = ft.dfs(entityset=es,
   ...:                       target_entity='log',
   ...:                       agg_primitives=['sum', 'mean'],
   ...:                       trans_primitives=['day'])
   ...: fm
   ...: 
Out[3]: 
             location  pies sold  DAY(date)  locations.DAY(first_log_time)  locations.MEAN(log.pies sold)  locations.SUM(log.pies sold)
index                                                                                                                                  
1         main street          3         29                             29                            4.0                            12
2         main street          4         30                             29                            4.0                            12
3         main street          5         31                             29                            4.0                            12
4      arlington ave.         18          1                              1                            9.5                            19
5      arlington ave.          1          2                              1                            9.5                            19

How to apply Deep Feature Synthesis to a single table

Tags:

featuretools

The Anh Nguyen

1 Answers

Max Kanter

Recent Activity

Donate For Us

How to apply Deep Feature Synthesis to a single table

Tags:

featuretools

The Anh Nguyen

1 Answers

Max Kanter

Related questions

Recent Activity

Donate For Us