Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow: Using neural network to classify positive or negative phrases

I am following through the tutorial here: https://pythonprogramming.net/train-test-tensorflow-deep-learning-tutorial/

I can get the Neural Network trained and print out the accuracy.

However, I do not know how to use the Neural Network to make a prediction.

Here is my attempt. Specifically the issue is this line - I believe my issue is that I cannot get my input string into the format the model expects:

features = get_features_for_input("This was the best store i've ever seen.")
result = (sess.run(tf.argmax(prediction.eval(feed_dict={x:features}),1)))

Here is a larger listing:

def train_neural_network(x):
    prediction = neural_network_model(x)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y)) 
    optimizer = tf.train.AdamOptimizer().minimize(cost)

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        for epoch in range(hm_epochs):
            epoch_loss = 0
            i = 0
            while i < len(train_x):
                start = i
                end = i + batch_size

                batch_x = np.array(train_x[start:end])
                batch_y = np.array(train_y[start:end])

                _, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})

                epoch_loss += c 
                i+=batch_size

            print('Epoch', epoch, 'completed out of', hm_epochs, 'loss:', epoch_loss)

        correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y,1))        
        accuracy = tf.reduce_mean(tf.cast(correct,'float'))
        print('Accuracy', accuracy.eval({x:test_x, y:test_y}))

        # pos: [1,0] , argmax: 0
        # neg: [0,1] , argmax: 1
        features = get_features_for_input("This was the best store i've ever seen.")
        result = (sess.run(tf.argmax(prediction.eval(feed_dict={x:features}),1)))
        if result[0] == 0:
            print('Positive:',input_data)
        elif result[0] == 1:
            print('Negative:',input_data)

def get_features_for_input(input):
    current_words = word_tokenize(input.lower())
    current_words = [lemmatizer.lemmatize(i) for i in current_words]
    features = np.zeros(len(lexicon))

    for word in current_words:
        if word.lower() in lexicon:
            index_value = lexicon.index(word.lower())
            # OR DO +=1, test both
            features[index_value] += 1

    features = np.array(list(features))

train_neural_network(x)
like image 952
Robben_Ford_Fan_boy Avatar asked Mar 22 '17 11:03

Robben_Ford_Fan_boy


People also ask

Is neural network good for text classification?

Based on some of the research papers, Hierarchical attention neural networks seem to be best for text classification in grasping the meaning of sentences and performing the task.

Does TensorFlow use neural network?

TensorFlow bundles together a slew of machine learning and deep learning models and algorithms (aka neural networks) and makes them useful by way of common programmatic metaphors.

What is the best algorithm for text classification?

Linear Support Vector Machine is widely regarded as one of the best text classification algorithms.

Can I use RNN for classification?

Recurrent Neural Networks(RNN) are a type of Neural Network where the output from the previous step is fed as input to the current step. RNN's are mainly used for, Sequence Classification — Sentiment Classification & Video Classification.


3 Answers

Following your comment above, it feels like your error ValueError: Cannot feed value of shape () is due to the fact that features is None, because your function get_features_for_input doesn't return anything.

I added the return features line and gave features a correct shape of [1, len(lexicon)] to match the shape of the placeholder.

def get_features_for_input(input):
    current_words = word_tokenize(input.lower())
    current_words = [lemmatizer.lemmatize(i) for i in current_words]
    features = np.zeros((1, len(lexicon)))

    for word in current_words:
        if word.lower() in lexicon:
            index_value = lexicon.index(word.lower())
            # OR DO +=1, test both
            features[0, index_value] += 1

    return features
like image 95
Olivier Moindrot Avatar answered Nov 04 '22 01:11

Olivier Moindrot


Your get_features_for_input function returns a single list representing features of a sentences but for feed_dict, the input needs to be of size [num_examples, features_size], here num_examples is 1.

The following code should work.

def get_features_for_input(input):
    current_words = word_tokenize(input.lower())
    current_words = [lemmatizer.lemmatize(i) for i in current_words]
    features = np.zeros(len(lexicon))

    for word in current_words:
        if word.lower() in lexicon:
            index_value = lexicon.index(word.lower())
            # OR DO +=1, test both
            features[index_value] += 1

    features = np.array(list(features))
    batch_features = []
    batch_features[0] = features
    return np.array(batch_features)
like image 24
vikasreddy Avatar answered Nov 04 '22 01:11

vikasreddy


Basic funda for any machine learning algorithm is dimention should be same during training and testing.

During training you created matrix shape number of training samples, len(lexicon). Here you are trying bag of words approach and lexicons are nothing but the unique word in your training data.

During testing your input vector size should be same as your vector size for training. And it is just the size of lexicon created during training. Also each element in test vector defines the corresponding index word in lexicons.

Now come to your problem, in get_features_for_input(input) you used the lexicon, you must have defined somewhere in program. Given the error what I conclude is your lexicon list is empty, so in get_features_for_input function features = np.zeros(len(lexicon)) will produce array of zero shape and also never enters in loop.

Few expected modifications:

You can find function create_feature_sets_and_labels in your tutorial. That returns your cleaned formatted training data. Change return statement to return the lexicon list along with data.

return train_x,train_y,test_x,test_y,lexicon

Make small change to collect lexicon list, ref:here

train_x,train_y,test_x,test_y,lexicon = create_feature_sets_and_labels('/path/to/pos.txt','/path/to/neg.txt')

And just pass this lexicon list alongwith your input to get_features_for_input function

features = get_features_for_input("This was the best store i've ever seen.",lexicon)

Make small change in get_features_for_input function

def get_features_for_input(text,lexicon):
    featureset = []
    current_words = word_tokenize(text.lower())
    current_words = [lemmatizer.lemmatize(i) for i in current_words]
    features = np.zeros(len(lexicon))
    for word in current_words:
        if word.lower() in lexicon:
            index_value = lexicon.index(word.lower())
            features[index_value] += 1
    featureset.append(features)
    return np.asarray(featureset)
like image 31
Nilesh Birari Avatar answered Nov 04 '22 02:11

Nilesh Birari