How do I change the dtype in TensorFlow for a csv file?

Tags:

Here is the code that I am trying to run-

import tensorflow as tf
import numpy as np
import input_data

filename_queue = tf.train.string_input_producer(["cs-training.csv"])

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

record_defaults = [[1], [1], [1], [1], [1], [1], [1], [1], [1], [1], [1]]
col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11 = tf.decode_csv(
    value, record_defaults=record_defaults)
features = tf.concat(0, [col2, col3, col4, col5, col6, col7, col8, col9, col10, col11])

with tf.Session() as sess:
  # Start populating the filename queue.
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  for i in range(1200):
    # Retrieve a single instance:
    print i
    example, label = sess.run([features, col1])
    try:
        print example, label
    except:
        pass

  coord.request_stop()
  coord.join(threads)

This code return the error below.

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-23-e42fe2609a15> in <module>()
      7     # Retrieve a single instance:
      8     print i
----> 9     example, label = sess.run([features, col1])
     10     try:
     11         print example, label

/root/anaconda/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict)
    343 
    344     # Run request and get response.
--> 345     results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
    346 
    347     # User may have fetched the same tensor multiple times, but we

/root/anaconda/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, target_list, fetch_list, feed_dict)
    417         # pylint: disable=protected-access
    418         raise errors._make_specific_exception(node_def, op, e.error_message,
--> 419                                               e.code)
    420         # pylint: enable=protected-access
    421       raise e_type, e_value, e_traceback

InvalidArgumentError: Field 1 in record 0 is not a valid int32: 0.766126609

It the has a lot of information following it which I think is irrelevant to the problem. Obviously the problem is that a lot of the data that I am feeding to the program is not of the dtype int32. It's mostly float numbers. I've tried a few things to change the dtype like explicitly setting the dtype=float argument in tf.decode_csv as well as the tf.concat. Neither worked. It's an invalid argument. To top it all off, I don't know if this code is going to actually make a prediction on the data. I want it to predict whether col1 is going to be a 1 or a 0 and I don't see anything in the code that would hint that it's going to actually make that prediction. Maybe I'll save that question for a different thread. Any help is greatly appreciated!

389

asked Nov 19 '15 15:11

Ravaal

2 Answers

The interface to tf.decode_csv() is a little tricky. The dtype of each column is determined by the corresponding element of the record_defaults argument. The value for record_defaults in your code is interpreted as each column having tf.int32 as its type, which leads to an error when it encounters floating-point data.

Let's say you have the following CSV data, containing three integer columns, followed by a floating point column:

4, 8, 9, 4.5
2, 5, 1, 3.7
2, 2, 2, 0.1

Assuming all of the columns are required, you would build record_defaults as follows:

value = ...

record_defaults = [tf.constant([], dtype=tf.int32),    # Column 0
                   tf.constant([], dtype=tf.int32),    # Column 1
                   tf.constant([], dtype=tf.int32),    # Column 2
                   tf.constant([], dtype=tf.float32)]  # Column 3

col0, col1, col2, col3 = tf.decode_csv(value, record_defaults=record_defauts)

assert col0.dtype == tf.int32
assert col1.dtype == tf.int32
assert col2.dtype == tf.int32
assert col3.dtype == tf.float32

An empty value in record_defaults signifies that the value is required. Alternatively, if (e.g.) column 2 is allowed to have missing values, you would define record_defaults as follows:

record_defaults = [tf.constant([], dtype=tf.int32),     # Column 0
                   tf.constant([], dtype=tf.int32),     # Column 1
                   tf.constant([0], dtype=tf.int32),    # Column 2
                   tf.constant([], dtype=tf.float32)]   # Column 3

The second part of your question concerns how to build and train a model that predicts the value of one of the columns from the input data. Currently, the program doesn't: it simply concatenates the columns into a single tensor, called features. You will need to define and train a model, that interprets that data. One of the simplest such approaches is linear regression, and you might find this tutorial on linear regression in TensorFlow adaptable to your problem.

answered Oct 21 '22 22:10

mrry

The answer to changing the dtype is to just change the defaults like so-

record_defaults = [[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.]]

After you do that, if you print out col1, you'll receive this message.

Tensor("DecodeCSV_43:0", shape=TensorShape([]), dtype=float32)

But there is another error that you will run into, which has been answered here. To recap the answer, the workaround is to change tf.concat to tf.pack like so.

features = tf.pack([col2, col3, col4, col5, col6, col7, col8, col9, col10, col11])

answered Oct 22 '22 00:10

Ravaal

Related questions
                            
                                Escape single quote (') in raw string r'...'
                            
                                django values_list of all fields in foreign key
                            
                                Handling Exceptions in Python Behave Testing framework
                            
                                Cannot import name simplejson - After installing simplejson
                            
                                Can I make ipython exit from the calling code?
                            
                                How to purge tasks in celery queues using Redis as the broker
                            
                                How to use different marker for different point in scatter plot pylab
                            
                                Decorate a function after it is defined?
                            
                                Creating a shell command line application with Python and Click
                            
                                Converting a datetime object to an integer python
                            
                                GPU Accelerated data plotting in Python
                            
                                How to split string without spaces into list of integers in Python? [duplicate]
                            
                                Flask only sees first parameter from multiple parameters sent with curl
                            
                                PyQt4 - creating a timer
                            
                                count number of black pixels in an image in Python with OpenCV
                            
                                eigenvectors created by numpy.linalg.eig don't seem correct
                            
                                Pyspark changing type of column from date to string
                            
                                xlwings function to find the last row with data
                            
                                Symbol not found: _BIO_new_CMS
                            
                                Align text for OCR

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I change the dtype in TensorFlow for a csv file?

Tags:

python

csv

tensorflow

Ravaal

People also ask

2 Answers

mrry

Ravaal

Recent Activity

Donate For Us