Here is the code that I am trying to run-
import tensorflow as tf
import numpy as np
import input_data
filename_queue = tf.train.string_input_producer(["cs-training.csv"])
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
record_defaults = [[1], [1], [1], [1], [1], [1], [1], [1], [1], [1], [1]]
col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11 = tf.decode_csv(
value, record_defaults=record_defaults)
features = tf.concat(0, [col2, col3, col4, col5, col6, col7, col8, col9, col10, col11])
with tf.Session() as sess:
# Start populating the filename queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(1200):
# Retrieve a single instance:
print i
example, label = sess.run([features, col1])
try:
print example, label
except:
pass
coord.request_stop()
coord.join(threads)
This code return the error below.
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-23-e42fe2609a15> in <module>()
7 # Retrieve a single instance:
8 print i
----> 9 example, label = sess.run([features, col1])
10 try:
11 print example, label
/root/anaconda/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict)
343
344 # Run request and get response.
--> 345 results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
346
347 # User may have fetched the same tensor multiple times, but we
/root/anaconda/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, target_list, fetch_list, feed_dict)
417 # pylint: disable=protected-access
418 raise errors._make_specific_exception(node_def, op, e.error_message,
--> 419 e.code)
420 # pylint: enable=protected-access
421 raise e_type, e_value, e_traceback
InvalidArgumentError: Field 1 in record 0 is not a valid int32: 0.766126609
It the has a lot of information following it which I think is irrelevant to the problem. Obviously the problem is that a lot of the data that I am feeding to the program is not of the dtype int32. It's mostly float numbers. I've tried a few things to change the dtype like explicitly setting the dtype=float
argument in tf.decode_csv
as well as the tf.concat
. Neither worked. It's an invalid argument. To top it all off, I don't know if this code is going to actually make a prediction on the data. I want it to predict whether col1 is going to be a 1 or a 0 and I don't see anything in the code that would hint that it's going to actually make that prediction. Maybe I'll save that question for a different thread. Any help is greatly appreciated!
load will return the tuple ( tf. data. Dataset , tfds.
We can access the data type of a tensor using the ". dtype" attribute of the tensor. It returns the data type of the tensor.
The interface to tf.decode_csv()
is a little tricky. The dtype
of each column is determined by the corresponding element of the record_defaults
argument. The value for record_defaults
in your code is interpreted as each column having tf.int32
as its type, which leads to an error when it encounters floating-point data.
Let's say you have the following CSV data, containing three integer columns, followed by a floating point column:
4, 8, 9, 4.5
2, 5, 1, 3.7
2, 2, 2, 0.1
Assuming all of the columns are required, you would build record_defaults
as follows:
value = ...
record_defaults = [tf.constant([], dtype=tf.int32), # Column 0
tf.constant([], dtype=tf.int32), # Column 1
tf.constant([], dtype=tf.int32), # Column 2
tf.constant([], dtype=tf.float32)] # Column 3
col0, col1, col2, col3 = tf.decode_csv(value, record_defaults=record_defauts)
assert col0.dtype == tf.int32
assert col1.dtype == tf.int32
assert col2.dtype == tf.int32
assert col3.dtype == tf.float32
An empty value in record_defaults
signifies that the value is required. Alternatively, if (e.g.) column 2 is allowed to have missing values, you would define record_defaults
as follows:
record_defaults = [tf.constant([], dtype=tf.int32), # Column 0
tf.constant([], dtype=tf.int32), # Column 1
tf.constant([0], dtype=tf.int32), # Column 2
tf.constant([], dtype=tf.float32)] # Column 3
The second part of your question concerns how to build and train a model that predicts the value of one of the columns from the input data. Currently, the program doesn't: it simply concatenates the columns into a single tensor, called features
. You will need to define and train a model, that interprets that data. One of the simplest such approaches is linear regression, and you might find this tutorial on linear regression in TensorFlow adaptable to your problem.
The answer to changing the dtype is to just change the defaults like so-
record_defaults = [[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.]]
After you do that, if you print out col1, you'll receive this message.
Tensor("DecodeCSV_43:0", shape=TensorShape([]), dtype=float32)
But there is another error that you will run into, which has been answered here. To recap the answer, the workaround is to change tf.concat
to tf.pack
like so.
features = tf.pack([col2, col3, col4, col5, col6, col7, col8, col9, col10, col11])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With