I have built a TensorFlow model that uses a <code>DNNClassifier</code> to classify input into two categories. My problem is that Outcome 1 occurs upwards of 90-95% of the time. Therefore, TensorFlow is giving me the same probabilities for all of my predictions. I am trying to predict the other outcome (e.g. having a false positive for Outcome 2 is preferable to missing a possible occurrence of Outcome 2). I know that in machine learning in general, in this case it would be worthwhile to try to upweight Outcome 2. However, I don't know how to do this in TensorFlow. The documentation alludes to it being possible, but I can't find any examples of what it would actually look like. Has anyone has successfully done this, or does anyone know where I could find some example code or a thorough explanation (I'm using Python)? Note: I have seen exposed weights being manipulated when someone is using the more fundamental parts of TensorFlow and not an estimator. For maintenance reasons, I need to do this using an estimator.

<code>tf.estimator.DNNClassifier</code> constructor has <code>weight_column</code> argument: <blockquote> <code>weight_column</code>: A string or a <code>_NumericColumn</code> created by <code>tf.feature_column.numeric_column</code> defining feature column representing weights. It is used to down weight or boost examples during training. It will be multiplied by the loss of the example. If it is a string, it is used as a key to fetch weight tensor from the <code>features</code>. If it is a <code>_NumericColumn</code>, raw tensor is fetched by key <code>weight_column.key</code>, then <code>weight_column.normalizer_fn</code> is applied on it to get weight tensor. </blockquote> So just add a new column and fill it with some weight for the rare class: <pre class="prettyprint lang-py prettyprint-override"><code>weight = tf.feature_column.numeric_column('weight') ... tf.estimator.DNNClassifier(..., weight_column=weight) </code></pre> [Update] Here's a complete working example: <pre class="prettyprint lang-py prettyprint-override"><code>import numpy as np import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('mnist', one_hot=False) train_x, train_y = mnist.train.next_batch(1024) test_x, test_y = mnist.test.images, mnist.test.labels x_column = tf.feature_column.numeric_column('x', shape=[784]) weight_column = tf.feature_column.numeric_column('weight') classifier = tf.estimator.DNNClassifier(feature_columns=[x_column], hidden_units=[100, 100], weight_column=weight_column, n_classes=10) # Training train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'x': train_x, 'weight': np.ones(train_x.shape[0])}, y=train_y.astype(np.int32), num_epochs=None, shuffle=True) classifier.train(input_fn=train_input_fn, steps=1000) # Testing test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'x': test_x, 'weight': np.ones(test_x.shape[0])}, y=test_y.astype(np.int32), num_epochs=1, shuffle=False) acc = classifier.evaluate(input_fn=test_input_fn) print('Test Accuracy: %.3f' % acc['accuracy']) </code></pre>

Upweight a Category

Tags:

python

machine-learning

tensorflow

deep-learning

I have built a TensorFlow model that uses a DNNClassifier to classify input into two categories.

My problem is that Outcome 1 occurs upwards of 90-95% of the time. Therefore, TensorFlow is giving me the same probabilities for all of my predictions.

I am trying to predict the other outcome (e.g. having a false positive for Outcome 2 is preferable to missing a possible occurrence of Outcome 2). I know that in machine learning in general, in this case it would be worthwhile to try to upweight Outcome 2.

However, I don't know how to do this in TensorFlow. The documentation alludes to it being possible, but I can't find any examples of what it would actually look like. Has anyone has successfully done this, or does anyone know where I could find some example code or a thorough explanation (I'm using Python)?

Note: I have seen exposed weights being manipulated when someone is using the more fundamental parts of TensorFlow and not an estimator. For maintenance reasons, I need to do this using an estimator.

709

asked Jan 04 '18 15:01

Abigail Fox

1 Answers

tf.estimator.DNNClassifier constructor has weight_column argument:

weight_column: A string or a _NumericColumn created by tf.feature_column.numeric_column defining feature column representing weights. It is used to down weight or boost examples during training. It will be multiplied by the loss of the example. If it is a string, it is used as a key to fetch weight tensor from the features. If it is a _NumericColumn, raw tensor is fetched by key weight_column.key, then weight_column.normalizer_fn is applied on it to get weight tensor.

So just add a new column and fill it with some weight for the rare class:

weight = tf.feature_column.numeric_column('weight')
...
tf.estimator.DNNClassifier(..., weight_column=weight)

[Update] Here's a complete working example:

import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('mnist', one_hot=False)
train_x, train_y = mnist.train.next_batch(1024)
test_x, test_y = mnist.test.images, mnist.test.labels

x_column = tf.feature_column.numeric_column('x', shape=[784])
weight_column = tf.feature_column.numeric_column('weight')
classifier = tf.estimator.DNNClassifier(feature_columns=[x_column],
                                        hidden_units=[100, 100],
                                        weight_column=weight_column,
                                        n_classes=10)

# Training
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'x': train_x, 'weight': np.ones(train_x.shape[0])},
                                                    y=train_y.astype(np.int32),
                                                    num_epochs=None, shuffle=True)
classifier.train(input_fn=train_input_fn, steps=1000)

# Testing
test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'x': test_x, 'weight': np.ones(test_x.shape[0])},
                                                   y=test_y.astype(np.int32),
                                                   num_epochs=1, shuffle=False)
acc = classifier.evaluate(input_fn=test_input_fn)
print('Test Accuracy: %.3f' % acc['accuracy'])

110

answered Oct 27 '22 02:10

Maxim

Related questions
                            
                                What is the difference between single and double bracket Numpy array?
                            
                                How to add template variable in the filename of an EmailOperator task? (Airflow)
                            
                                Scikit-learn principal component analysis (PCA) for dimension reduction
                            
                                Conditional mean and sum of previous N rows in pandas dataframe
                            
                                Chamfer distance between two point clouds in tensorflow
                            
                                Assigning python dictionary literals: are the semantics guaranteed? [duplicate]
                            
                                Stream audio from pyaudio with Flask to HTML5
                            
                                How to display table with text and images in Jupyter notebook?
                            
                                What is partitioner parameter in Tensorflow variable_scope used for?
                            
                                Backpropagation with Momentum
                            
                                Change the color for ytick labels in seaborn.clustermap
                            
                                Modify neural net to classify single example
                            
                                pip install local package to target directory
                            
                                How do I use absolute and relative imports in python 3.6?
                            
                                How to convert a wand image object to numpy array (without OpenCV)?
                            
                                Python logging - multiple modules
                            
                                How to concurrently run a infinite loop with asyncio?
                            
                                Django Rest Framework serializer `source` giving weird required error
                            
                                how do I update root certificates of certifi?
                            
                                Error with matches1to2 with Opencv SIFT

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With