Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow feature column for variable list of values

From the TensorFlow docs it's clear how to use tf.feature_column.categorical_column_with_vocabulary_list to create a feature column which takes as input some string and outputs a one-hot vector. For example

vocabulary_feature_column =
    tf.feature_column.categorical_column_with_vocabulary_list(
        key="vocab_feature",
        vocabulary_list=["kitchenware", "electronics", "sports"])

Let's say "kitchenware" maps to [1,0,0] and "electronics" maps to [0,1,0]. My question is related to having a list of strings as a feature. For example, if the feature value was ["kitchenware","electronics"] then the desired output would be [1,1,0]. The input list length is not fixed but the output dimension is.

The use case is a straight bag-of-words type model (obviously with a much larger vocabulary list!).

What is the correct way to implement this?

like image 581
GratefulGuest Avatar asked Feb 09 '18 02:02

GratefulGuest


People also ask

What is feature column in TensorFlow?

Feature columns these are nothing but the bridge between the raw data and the model or estimator. These are very rich, enabling us for transforming and diversify the range of raw data into the formats that the models or estimators can use, allowing easy experimentation.

What is Num_oov_buckets?

num_oov_buckets. Non-negative integer, the number of out-of-vocabulary buckets.


2 Answers

Here is an example how to feed data to the indicator column:

features = {'letter': [['A','A'], ['C','D'], ['E','F'], ['G','A'], ['X','R']]}

letter_feature = tf.feature_column.categorical_column_with_vocabulary_list(
                "letter", ["A", "B", "C"], dtype=tf.string)

indicator = tf.feature_column.indicator_column(letter_feature)
tensor = tf.feature_column.input_layer(features, [indicator])

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    session.run(tf.tables_initializer())
    print(session.run([tensor]))

Which outputs:

[array([[2., 0., 0.],
       [0., 0., 1.],
       [0., 0., 0.],
       [1., 0., 0.],
       [0., 0., 0.]], dtype=float32)]
like image 102
jamborta Avatar answered Sep 17 '22 14:09

jamborta


you should use tf.feature_column.indicator_column see https://www.tensorflow.org/versions/master/api_docs/python/tf/feature_column/indicator_column

like image 33
Spirit_Dongdong Avatar answered Sep 20 '22 14:09

Spirit_Dongdong