Tensorflow : ValueError Duplicate feature column key found for column

Question

I was trying to get my hands dirty with Tensorflow and following Wide and Deep Learning example code. I modified certain imports for it to work with python 3.4 on centos 7.

Highlights of the changes are:

    -import urllib
    +import urllib.request

...

    -urllib.urlretrieve
    +urllib.request.urlretrieve

...

On running the code, I am getting following error

    Training data is downloaded to /tmp/tmpw06u4_xl
    Test data is downloaded to /tmp/tmpjliqxhwh
    model directory = /tmp/tmpcyll7kck
    WARNING:tensorflow:Setting feature info to {'education': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'capital_gain': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(32561)]), is_sparse=False), 'capital_loss': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(32561)]), is_sparse=False), 'hours_per_week': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(32561)]), is_sparse=False), 'gender': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'occupation': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'native_country': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'race': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'age': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(32561)]), is_sparse=False), 'education_num': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(32561)]), is_sparse=False), 'marital_status': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'workclass': TensorSignature(dtype=tf.string, shape=None, is_sparse=True), 'relationship': TensorSignature(dtype=tf.string, shape=None, is_sparse=True)}
    WARNING:tensorflow:Setting targets info to TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(32561)]), is_sparse=False)
    Traceback (most recent call last):
      File "wide_n_deep_tutorial.py", line 213, in <module>
        tf.app.run()
      File "/usr/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 30, in run
        sys.exit(main(sys.argv))
      File "wide_n_deep_tutorial.py", line 209, in main
        train_and_eval()
      File "wide_n_deep_tutorial.py", line 202, in train_and_eval
        m.fit(input_fn=lambda: input_fn(df_train), steps=FLAGS.train_steps)
      File "/usr/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 240, in fit
        max_steps=max_steps)
      File "/usr/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 550, in _train_model
        train_op, loss_op = self._get_train_ops(features, targets)
      File "/usr/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 182, in _get_train_ops
        logits = self._logits(features, is_training=True)
      File "/usr/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 260, in _logits
        dnn_feature_columns = self._get_dnn_feature_columns()
      File "/usr/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 224, in _get_dnn_feature_columns
        feature_column_ops.check_feature_columns(self._dnn_feature_columns)
      File "/usr/lib/python3.4/site-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.py", line 318, in check_feature_columns
        f.name))
    ValueError: Duplicate feature column key found for column: education_embedding. This usually means that the column is almost identical to another column, and one must be discarded.

Is that I have change some variable or is this a python 3 problem. How can I get going forward with this tutorial.

Jesse Pangburn · Accepted Answer

Final update I had this problem with the recommended 0.10rc0 branch, but after reinstalling using the master (no branch on git clone) this problem went away. I checked the source code and they fixed it. Python 3 now gets the same results as Python 2 for wide_n_deep mode, after fixing the urllib.request thing you already mentioned.

For anyone coming later and still using 0.10rc0 branch, feel free to read on:

Had the same problem, and did some debugging. Looks like a bug in tensorflow/contrib/layers/python/layers/feature_column.py in the _EmbeddingColumn class. The key(self) property is plagued by this bug: https://bugs.python.org/issue24931

So instead of coming out with a nice unique key, we get the following key for all _EmbeddingColumn instances: '_EmbeddingColumn()'

This causes the feature_column_ops.py's check_feature_columns() function to determine that the second _EmbeddingColumn instance is a duplicate since they keys of all of them are the same.

I'm kind of a Python noob, and I can't figure out how to monkey patch a property. So I fixed this problem by creating a subclass at the top of the wide_n_deep tutorial file:

# EmbeddingColumn for Python 3.4 has a problem with key property
# can't monkey patch a property, so subclass it and make a method to create the 
# subclass to use instead of "embedding_column"
from tensorflow.contrib.layers.python.layers.feature_column import _EmbeddingColumn
class _MonkeyEmbeddingColumn(_EmbeddingColumn):
  # override the key property
  @property
  def key(self):
    return "{}".format(self)

def monkey_embedding_column(sparse_id_column,
                     dimension,
                     combiner="mean",
                     initializer=None,
                     ckpt_to_load_from=None,
                     tensor_name_in_ckpt=None):
  return _MonkeyEmbeddingColumn(sparse_id_column, dimension, combiner, initializer, ckpt_to_load_from, tensor_name_in_ckpt)

Then find the calls like this:

tf.contrib.layers.embedding_column(workclass, dimension=8)

and replace "tf.contrib.layers." with "monkey_" so you now have:

  deep_columns = [
      monkey_embedding_column(workclass, dimension=8),
      monkey_embedding_column(education, dimension=8),
      monkey_embedding_column(marital_status,
                                         dimension=8),
      monkey_embedding_column(gender, dimension=8),
      monkey_embedding_column(relationship, dimension=8),
      monkey_embedding_column(race, dimension=8),
      monkey_embedding_column(native_country,
                                         dimension=8),
      monkey_embedding_column(occupation, dimension=8),
      age,
      education_num,
      capital_gain,
      capital_loss,
      hours_per_week,
  ]

So now it uses the MonkeyEmbeddingColumn class with the modified key property (works like all the other key properties from feature_column.py). This lets the code run to completion, but I'm not 100% sure it's correct as it reports the accuracy as:

accuracy: 0.818316

As this is slightly worse than the wide-only training, I wonder if it has this accuracy in Python 2 or if my fix is lowering the accuracy by causing a training problem.

Update I installed in Python 2 and the wide_n_deep gets over 0.85 accuracy, so this "fix" lets the code run but seems to be doing the wrong thing. I'll debug and see what Python 2 gets for these values and see if it can be fixed properly in Python 3. I'm curious too.

Tensorflow : ValueError Duplicate feature column key found for column

Tags:

python

machine-learning

tensorflow

deep-learning

python-3.4

ajay0221

1 Answers

Jesse Pangburn

Recent Activity

Donate For Us

Tensorflow : ValueError Duplicate feature column key found for column

Tags:

python

machine-learning

tensorflow

deep-learning

python-3.4

ajay0221

1 Answers

Jesse Pangburn

Related questions

Recent Activity

Donate For Us