Training TensorFlow for Predicting a Column in a csv file

Tags:

I have data that is structured in a csv file. I want to be able to predict whether column 1 is going to be a 1 or a 0 given all other columns. How do I go about training the program (preferably using Neural Networks) to use all of the given data in order to make that prediction. Is there code that someone can show me? I've tried feeding it numpy.ndarray, FIF0Que (sorry if I spelt that wrong), and a DataFrame; nothing has worked yet. Here is the code I am running until I get the error-

import tensorflow as tf
import numpy as np
from numpy import genfromtxt

data = genfromtxt('cs-training.csv',delimiter=',')

x = tf.placeholder("float", [None, 11])
W = tf.Variable(tf.zeros([11,2]))
b = tf.Variable(tf.zeros([2]))

y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,2])

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)

for i in range(1000):
    batch_xs, batch_ys = data.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

At which point I run into this error-

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-128-b48741faa01b> in <module>()
      1 for i in range(1000):
----> 2     batch_xs, batch_ys = data.train.next_batch(100)
      3     sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

AttributeError: 'numpy.ndarray' object has no attribute 'train'

Any help is greatly appreciated. All I need to do is predict whether column 1 is going to be a 1 or a 0. Even if all you do is get me past this one error, I should be able to take it from there.

EDIT: This is what the csv looks like when I print it out.

[[1,0.766126609,45,2,0.802982129,9120,13,0,6,0,2],
[0,0.957151019,40,0,0.121876201,2600,4,0,0,0,1],
[0,0.65818014,38,1,0.085113375,3042,2,1,0,0,0],
[0,0.233809776,30,0,0.036049682,3300,5,0,0,0,0]]

I'm trying to predict the first column.

369

asked Nov 18 '15 20:11

Ravaal

1 Answers

The following reads from a CSV file and builds a tensorflow program. The example uses the Iris data set, since that maybe a more meaningful example. However, it should probably work for your data as well.

Please note, the first column will be [0,1 or 2], since there are 3 species of iris.

#!/usr/bin/env python
import tensorflow as tf
import numpy as np
from numpy import genfromtxt

# Build Example Data is CSV format, but use Iris data
from sklearn import datasets
from sklearn.cross_validation import train_test_split
import sklearn
def buildDataFromIris():
    iris = datasets.load_iris()
    X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.33, random_state=42)
    f=open('cs-training.csv','w')
    for i,j in enumerate(X_train):
        k=np.append(np.array(y_train[i]),j   )
        f.write(",".join([str(s) for s in k]) + '\n')
    f.close()
    f=open('cs-testing.csv','w')
    for i,j in enumerate(X_test):
        k=np.append(np.array(y_test[i]),j   )
        f.write(",".join([str(s) for s in k]) + '\n')
    f.close()


# Convert to one hot
def convertOneHot(data):
    y=np.array([int(i[0]) for i in data])
    y_onehot=[0]*len(y)
    for i,j in enumerate(y):
        y_onehot[i]=[0]*(y.max() + 1)
        y_onehot[i][j]=1
    return (y,y_onehot)


buildDataFromIris()


data = genfromtxt('cs-training.csv',delimiter=',')  # Training data
test_data = genfromtxt('cs-testing.csv',delimiter=',')  # Test data

x_train=np.array([ i[1::] for i in data])
y_train,y_train_onehot = convertOneHot(data)

x_test=np.array([ i[1::] for i in test_data])
y_test,y_test_onehot = convertOneHot(test_data)


#  A number of features, 4 in this example
#  B = 3 species of Iris (setosa, virginica and versicolor)
A=data.shape[1]-1 # Number of features, Note first is y
B=len(y_train_onehot[0])
tf_in = tf.placeholder("float", [None, A]) # Features
tf_weight = tf.Variable(tf.zeros([A,B]))
tf_bias = tf.Variable(tf.zeros([B]))
tf_softmax = tf.nn.softmax(tf.matmul(tf_in,tf_weight) + tf_bias)

# Training via backpropagation
tf_softmax_correct = tf.placeholder("float", [None,B])
tf_cross_entropy = -tf.reduce_sum(tf_softmax_correct*tf.log(tf_softmax))

# Train using tf.train.GradientDescentOptimizer
tf_train_step = tf.train.GradientDescentOptimizer(0.01).minimize(tf_cross_entropy)

# Add accuracy checking nodes
tf_correct_prediction = tf.equal(tf.argmax(tf_softmax,1), tf.argmax(tf_softmax_correct,1))
tf_accuracy = tf.reduce_mean(tf.cast(tf_correct_prediction, "float"))

# Initialize and run
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

print("...")
# Run the training
for i in range(30):
    sess.run(tf_train_step, feed_dict={tf_in: x_train, tf_softmax_correct: y_train_onehot})

# Print accuracy
    result = sess.run(tf_accuracy, feed_dict={tf_in: x_test, tf_softmax_correct: y_test_onehot})
    print "Run {},{}".format(i,result)


"""
Below is the ouput
  ...
  Run 0,0.319999992847
  Run 1,0.300000011921
  Run 2,0.379999995232
  Run 3,0.319999992847
  Run 4,0.300000011921
  Run 5,0.699999988079
  Run 6,0.680000007153
  Run 7,0.699999988079
  Run 8,0.680000007153
  Run 9,0.699999988079
  Run 10,0.680000007153
  Run 11,0.680000007153
  Run 12,0.540000021458
  Run 13,0.419999986887
  Run 14,0.680000007153
  Run 15,0.699999988079
  Run 16,0.680000007153
  Run 17,0.699999988079
  Run 18,0.680000007153
  Run 19,0.699999988079
  Run 20,0.699999988079
  Run 21,0.699999988079
  Run 22,0.699999988079
  Run 23,0.699999988079
  Run 24,0.680000007153
  Run 25,0.699999988079
  Run 26,1.0
  Run 27,0.819999992847
  ...

 Ref:
 https://gist.github.com/mchirico/bcc376fb336b73f24b29#file-tensorflowiriscsv-py
"""

I hope this helps.

answered Sep 21 '22 17:09

Mike Chirico

Related questions
                            
                                How do I convert an int representing a UTF-8 character into a Unicode code point?
                            
                                Handling very small numbers in python
                            
                                How to annotate text along curved lines in Python?
                            
                                How to parallel sum a loop using multiprocessing in Python
                            
                                Getting errors / failing tests when installing Python3.4.3 on Lubuntu 14.04
                            
                                Changing the second result of a function call with mock
                            
                                Why is `if` so much faster when checked before a statement than after a statement?
                            
                                How to use a local variable in other functions flask?
                            
                                numpy slice an array without copying it
                            
                                Pywin32 save .docx as pdf
                            
                                Generic :python command in vim?
                            
                                Why doesn't python take advantage of __iadd__ for sum and chained operators?
                            
                                Save file from Python script to Docker Container
                            
                                Healpy: From Data to Healpix map
                            
                                no attribute named read_csv in pandas python
                            
                                How to combine single and multiindex Pandas DataFrames
                            
                                Does the Google Spreadsheet Python API or gspread allow images or rich text?
                            
                                What exactly does "iterable" mean in Python? Why isn't my object which implements `__getitem__()` an iterable?
                            
                                IPython notebook ~ Using javascript to run python code?
                            
                                Check IP address used for a request Python/Scrapy + ProxyMesh

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Training TensorFlow for Predicting a Column in a csv file

Tags:

python

csv

numpy

tensorflow

Ravaal

People also ask

1 Answers

Mike Chirico

Recent Activity

Donate For Us