Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Featuretools: Can it be applied on a single table to generate features even when there is no datetime related column?

The featuretools documentation states in its very first sentence:

"Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning."

This seems to imply that dataset must have a datetime column. I just want to have it confirmed that this is actually so. That is, for example, I cannot use it on 'iris' dataset to generate new features? If dataset need not have time variable, how would I use it to generate features on 'iris' dataset. I will be grateful for a reply. Thanks.

like image 560
Ashok K Harnal Avatar asked Sep 20 '18 05:09

Ashok K Harnal


1 Answers

Featuretools works for relational datasets with or without datetimes and in answer to your question, Featuretools can make features for a single table without a datetime. For the iris dataset, there is only a single table and no immediate features on which to normalize (make a new table from an existing table), so you would use transform primitives to make new features.

  1. Make an EntitySet
  2. Add a single entity
  3. Run Deep Feature Synthesis using whichever transform primitives you want.

Here is a complete working example:

from sklearn.datasets import load_iris
import pandas as pd
import featuretools as ft

# Load data and put into dataframe
iris = load_iris()
df = pd.DataFrame(iris.data, columns = iris.feature_names)
df['species'] = iris.target
df['species'] = df['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

# Make an entityset and add the entity
es = ft.EntitySet(id = 'iris')
es.entity_from_dataframe(entity_id = 'data', dataframe = df, 
                         make_index = True, index = 'index')

# Run deep feature synthesis with transformation primitives
feature_matrix, feature_defs = ft.dfs(entityset = es, target_entity = 'data',
                                      trans_primitives = ['add_numeric', 'multiply_numeric'])

feature_matrix.head()

First five rows of feature matrix:

                  sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm) species  petal width (cm) + sepal width (cm)  petal length (cm) + petal width (cm)  petal length (cm) + sepal length (cm)  petal length (cm) + sepal width (cm)  sepal length (cm) + sepal width (cm)  petal width (cm) + sepal length (cm)  petal length (cm) * sepal width (cm)  sepal length (cm) * sepal width (cm)  petal width (cm) * sepal length (cm)  petal width (cm) * sepal width (cm)  petal length (cm) * sepal length (cm)  petal length (cm) * petal width (cm)  petal width (cm) + sepal width (cm) * sepal length (cm) + sepal width (cm)  petal width (cm) + sepal width (cm) * sepal length (cm)  petal length (cm) + petal width (cm) * petal width (cm)  petal width (cm) + sepal length (cm) * sepal length (cm)  petal length (cm) * petal width (cm) + sepal length (cm)  petal width (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) + sepal length (cm) * sepal width (cm)  petal length (cm) + petal width (cm) * sepal length (cm)  petal length (cm) + sepal length (cm) * petal width (cm) + sepal width (cm)  petal length (cm) + sepal length (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) + sepal width (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) + sepal width (cm) * sepal length (cm)  petal length (cm) + petal width (cm) * petal length (cm) + sepal length (cm)  petal length (cm) + sepal length (cm) * petal width (cm)  petal length (cm) + sepal width (cm) * sepal width (cm)  petal length (cm) + petal width (cm) * petal length (cm) + sepal width (cm)  sepal length (cm) + sepal width (cm) * sepal width (cm)  petal length (cm) + sepal length (cm) * petal width (cm) + sepal length (cm)  petal width (cm) + sepal length (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) + petal width (cm) * petal width (cm) + sepal width (cm)  petal length (cm) + sepal width (cm) * petal width (cm) + sepal length (cm)  petal length (cm) * petal length (cm) + sepal length (cm)  petal width (cm) * petal width (cm) + sepal width (cm)  petal length (cm) + petal width (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) * petal width (cm) + sepal width (cm)  petal length (cm) + sepal width (cm) * petal width (cm)  petal length (cm) * petal length (cm) + petal width (cm)  petal length (cm) + petal width (cm) * sepal width (cm)  petal length (cm) + sepal length (cm) * sepal length (cm)  petal width (cm) + sepal length (cm) * sepal width (cm)  petal length (cm) * petal length (cm) + sepal width (cm)  petal width (cm) + sepal width (cm) * sepal width (cm)  petal length (cm) + sepal length (cm) * petal length (cm) + sepal width (cm)  sepal length (cm) * sepal length (cm) + sepal width (cm)  petal width (cm) * petal width (cm) + sepal length (cm)  petal length (cm) + sepal width (cm) * petal width (cm) + sepal width (cm)  petal length (cm) + petal width (cm) * petal width (cm) + sepal length (cm)  petal width (cm) + sepal length (cm) * petal width (cm) + sepal width (cm)
    index                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
    0                    5.1               3.5                1.4               0.2  setosa                                  3.7                                   1.6                                    6.5                                   4.9                                   8.6                                   5.3                                  4.90                                 17.85                                  1.02                                 0.70                                   7.14                                  0.28                                              31.82                                                                       18.87                                                     0.32                                                    27.03                                                      7.42                                                      1.72                                                    22.75                                                      8.16                                                     24.05                                                                        55.90                                                                         12.04                                                     42.14                                                                        24.99                                                     10.40                                                                          1.30                                                     17.15                                                     7.84                                                                        30.10                                                    34.45                                                                         45.58                                                                         5.92                                                                       25.97                                                                         9.10                                                       0.74                                                   13.76                                                                         5.18                                                     0.98                                                     2.24                                                      5.60                                                    33.15                                                      18.55                                                     6.86                                                     12.95                                                   31.85                                                                         43.86                                                      1.06                                                    18.13                                                                        8.48                                                                        19.61                         
    1                    4.9               3.0                1.4               0.2  setosa                                  3.2                                   1.6                                    6.3                                   4.4                                   7.9                                   5.1                                  4.20                                 14.70                                  0.98                                 0.60                                   6.86                                  0.28                                              25.28                                                                       15.68                                                     0.32                                                    24.99                                                      7.14                                                      1.58                                                    18.90                                                      7.84                                                     20.16                                                                        49.77                                                                         11.06                                                     34.76                                                                        21.56                                                     10.08                                                                          1.26                                                     13.20                                                     7.04                                                                        23.70                                                    32.13                                                                         40.29                                                                         5.12                                                                       22.44                                                                         8.82                                                       0.64                                                   12.64                                                                         4.48                                                     0.88                                                     2.24                                                      4.80                                                    30.87                                                      15.30                                                     6.16                                                      9.60                                                   27.72                                                                         38.71                                                      1.02                                                    14.08                                                                        8.16                                                                        16.32                         
    2                    4.7               3.2                1.3               0.2  setosa                                  3.4                                   1.5                                    6.0                                   4.5                                   7.9                                   4.9                                  4.16                                 15.04                                  0.94                                 0.64                                   6.11                                  0.26                                              26.86                                                                       15.98                                                     0.30                                                    23.03                                                      6.37                                                      1.58                                                    19.20                                                      7.05                                                     20.40                                                                        47.40                                                                         10.27                                                     35.55                                                                        21.15                                                      9.00                                                                          1.20                                                     14.40                                                     6.75                                                                        25.28                                                    29.40                                                                         38.71                                                                         5.10                                                                       22.05                                                                         7.80                                                       0.68                                                   11.85                                                                         4.42                                                     0.90                                                     1.95                                                      4.80                                                    28.20                                                      15.68                                                     5.85                                                     10.88                                                   27.00                                                                         37.13                                                      0.98                                                    15.30                                                                        7.35                                                                        16.66                         
    3                    4.6               3.1                1.5               0.2  setosa                                  3.3                                   1.7                                    6.1                                   4.6                                   7.7                                   4.8                                  4.65                                 14.26                                  0.92                                 0.62                                   6.90                                  0.30                                              25.41                                                                       15.18                                                     0.34                                                    22.08                                                      7.20                                                      1.54                                                    18.91                                                      7.82                                                     20.13                                                                        46.97                                                                         11.55                                                     35.42                                                                        21.16                                                     10.37                                                                          1.22                                                     14.26                                                     7.82                                                                        23.87                                                    29.28                                                                         36.96                                                                         5.61                                                                       22.08                                                                         9.15                                                       0.66                                                   13.09                                                                         4.95                                                     0.92                                                     2.55                                                      5.27                                                    28.06                                                      14.88                                                     6.90                                                     10.23                                                   28.06                                                                         35.42                                                      0.96                                                    15.18                                                                        8.16                                                                        15.84                         
    4                    5.0               3.6                1.4               0.2  setosa                                  3.8                                   1.6                                    6.4                                   5.0                                   8.6                                   5.2                                  5.04                                 18.00                                  1.00                                 0.72                                   7.00                                  0.28                                              32.68                                                                       19.00                                                     0.32                                                    26.00                                                      7.28                                                      1.72                                                    23.04                                                      8.00                                                     24.32                                                                        55.04                                                                         12.04                                                     43.00                                                                        25.00                                                     10.24                                                                          1.28                                                     18.00                                                     8.00                                                                        30.96                                                    33.28                                                                         44.72                                                                         6.08                                                                       26.00                                                                         8.96                                                       0.76                                                   13.76                                                                         5.32                                                     1.00                                                     2.24                                                      5.76                                                    32.00                                                      18.72                                                     7.00                                                     13.68                                                   32.00                                                                         43.00                                                      1.04                                                    19.00                                                                        8.32                                                                        19.76

For more on primitives, see here and for a list of all the transform primitives see here.

Featuretools works best for relational datasets with many tables, but it can also work well for single tables. There are several demos working from a single table where an entity is normalized in order to create multiple tables. The taxi trip duration project is a good example of this method.

With the iris dataset, there are only 4 numerical features - assuming you are predicting the species - and these do not directly make for values on which you can normalize. However, you could apply a clustering technique such as KMeans clustering to the numerical features and then create an entity based on the cluster assignments. The predict remaining useful life project has an example of this technique.

like image 116
willk Avatar answered Nov 11 '22 06:11

willk