Creating many feature columns in Tensorflow

Tags:

I'm getting started on a Tensorflow project, and am in the middle of defining and creating my feature columns. However, I have hundreds and hundreds of features- it's a pretty extensive dataset. Even after preprocessing and scrubbing, I have a lot of columns.

The traditional way of creating a feature_column is defined in the Tensorflow tutorial and even this StackOverflow post. You essentially declare and initialize a Tensorflow object for each feature column:

gender = tf.feature_column.categorical_column_with_vocabulary_list(
    "gender", ["Female", "Male"])

This works all well and good if your dataset has only a few columns, but in my case, I surely don't want to have hundreds of lines of code initializing different feature_column objects.

What's the best way to resolve this issue? I notice that in the tutorial, all the columns are collected as a list:

base_columns = [
    gender, native_country, education, occupation, workclass, relationship,
    age_buckets,
]

Which is ultimately passed into your estimator:

m = tf.estimator.LinearClassifier(
    model_dir=model_dir, feature_columns=base_columns)

So would the ideal way of handling feature_column creation for hundreds of columns be to append them directly into a list? Something like this?

my_columns = []

for col in df.columns:
    if is_string_dtype(df[col]): #is_string_dtype is pandas function
        my_column.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
            hash_bucket_size= len(df[col].unique())))

    elif is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
        my_column.append(tf.feature_column.numeric_column(col))

Is this the best way of creating these feature columns? Or am I missing some functionality to Tensorflow that allows me to work around this step?

342

asked Oct 19 '17 16:10

Yu Chen

2 Answers

What you have posted in the question makes sense. Small extension based on your own code:

import pandas.api.types as ptypes
my_columns = []
for col in df.columns:
  if ptypes.is_string_dtype(df[col]): 
    my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
        hash_bucket_size= len(df[col].unique())))

  elif ptypes.is_numeric_dtype(df[col]): 
    my_columns.append(tf.feature_column.numeric_column(col))

  elif ptypes.is_categorical_dtype(df[col]): 
    my_columns.append(tf.feature_column.categorical_column(col, 
        hash_bucket_size= len(df[col].unique())))

181

answered Oct 10 '22 03:10

greeness

I used your own answer. Just edited a little bit (there should be my_columns instead of my_column in for loop) and posting it the way it worked for me.

import pandas.api.types as ptypes

my_columns = []

for col in df.columns:
  if ptypes.is_string_dtype(df[col]): #is_string_dtype is pandas function
    my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
        hash_bucket_size= len(df[col].unique())))

  elif ptypes.is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
    my_columns.append(tf.feature_column.numeric_column(col))

answered Oct 10 '22 04:10

Maxim Zh

Related questions
                            
                                How can SQLAlchemy be taught to recover from a disconnect?
                            
                                Suppress "field should be unique" error in Django REST framework
                            
                                How can a python 2 doctest fail and yet have no difference in the values in the failure message?
                            
                                Selenium: Trying to log in with cookies - "Can only set cookies for current domain"
                            
                                TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
                            
                                SciPy SVD vs. Numpy SVD
                            
                                descriptor 'time' of 'datetime.datetime' object needs an argument
                            
                                Python requests API using proxy for https request get 407 Proxy Authentication Required
                            
                                Python - Plotting velocity and acceleration vectors at certain points
                            
                                Where should you update Celery settings? On the remote worker or sender?
                            
                                Gauss-Legendre over intervals -x -> infinity: adaptive algorithm to transform weights and nodes efficiently
                            
                                Fastest way to parse JSON strings into numpy arrays
                            
                                Change rows order pandas data frame
                            
                                How to estimate the progress of a GridSearchCV from verbose output in Scikit-Learn?
                            
                                Return by reference possible?
                            
                                POST HTML5 audio data to server
                            
                                Why does int(maxint) give a long, but int(int(maxint)) give an int? Is this a NumPy bug?
                            
                                How to serve Django for an Electron app
                            
                                Differences between CV2 image processing and tf.image processing
                            
                                How do I trigger Airflow -dag using TriggerDagRunOperator

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating many feature columns in Tensorflow

Tags:

python

neural-network

tensorflow

Yu Chen

People also ask

2 Answers

greeness

Maxim Zh

Recent Activity

Donate For Us