Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Network to predict nth square

I am trying to use multi-layer neural network to predict nth square.

I have the following training data containing the first 99 squares

1    1
2    4
3    9
4    16
5    25
...
98   9604
99   9801

This is the code:

import numpy as np
import neurolab as nl

# Load input data
text = np.loadtxt('data_sq.txt')

# Separate it into datapoints and labels
data = text[:, :1]
labels = text[:, 1:]

# Define a multilayer neural network with 2 hidden layers;
# First hidden layer consists of 10 neurons
# Second hidden layer consists of 6 neurons
# Output layer consists of 1 neuron
nn = nl.net.newff([[0, 99]], [10, 6, 1]) 

# Train the neural network
error_progress = nn.train(data, labels, epochs=2000, show=10, goal=0.01) 

# Run the classifier on test datapoints
print('\nTest results:')
data_test = [[100], [101]]
for item in data_test:
    print(item, '-->', nn.sim([item])[0])

Which prints 1 for both 100th and 101st squares:

Test results:
[100] --> [ 1.]
[101] --> [ 1.]

What is the right way to do this?

like image 265
gammay Avatar asked Mar 31 '17 12:03

gammay


2 Answers

Following Filip Malczak's and Seanny123's suggestions and comments, I implemented a neural network in tensorflow to check what happens when we try to teach it to predict (and interpolate) the 2-nd square.

Training on continuous interval

I trained the network on the interval [-7,7] (taking 300 points inside this interval, to make it continuous), and then tested it on the interval [-30,30]. The activation functions are ReLu, and the network has 3 hidden layers, each one is of size 50. epochs=500. The result is depicted in the figure below. enter image description here

So basically, inside (and also close to) the interval [-7,7], the fit is quite perfect, and then it continues more or less linearly outside. It is nice to see that at least initially, the slope of the network's output tries to "match" the slope of x^2. If we increase the test interval, the two graphs diverge quite a lot, as one can see in the figure below:

enter image description here

Training on even numbers

Finally, if instead I train the network on the set of all even integers in the interval [-100,100], and apply it on the set of all integers (even and odd) in this interval, I get:enter image description here

When training the network to produce the image above, I increased the epochs to 2500 to get a better accuracy. The rest of the parameters stayed unchanged. So it seems that interpolating "inside" the training interval works quite well (maybe except of the area around 0, where the fit is a bit worse).

Here is the code that I used for the first figure:

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.python.framework.ops import reset_default_graph

#preparing training data
train_x=np.linspace(-7,7,300).reshape(-1,1)
train_y=train_x**2

#setting network features
dimensions=[50,50,50,1]
epochs=500
batch_size=5

reset_default_graph()
X=tf.placeholder(tf.float32, shape=[None,1])
Y=tf.placeholder(tf.float32, shape=[None,1])

weights=[]
biases=[]
n_inputs=1

#initializing variables
for i,n_outputs in enumerate(dimensions):
    with tf.variable_scope("layer_{}".format(i)):
        w=tf.get_variable(name="W",shape=[n_inputs,n_outputs],initializer=tf.random_normal_initializer(mean=0.0,stddev=0.02,seed=42))
        b=tf.get_variable(name="b",initializer=tf.zeros_initializer(shape=[n_outputs]))
        weights.append(w)
        biases.append(b)
        n_inputs=n_outputs

def forward_pass(X,weights,biases):
    h=X
    for i in range(len(weights)):
        h=tf.add(tf.matmul(h,weights[i]),biases[i])
        h=tf.nn.relu(h)
    return h

output_layer=forward_pass(X,weights,biases)
cost=tf.reduce_mean(tf.squared_difference(output_layer,Y),1)
cost=tf.reduce_sum(cost)
optimizer=tf.train.AdamOptimizer(learning_rate=0.01).minimize(cost)


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    #train the network
    for i in range(epochs):
        idx=np.arange(len(train_x))
        np.random.shuffle(idx)
        for j in range(len(train_x)//batch_size):
            cur_idx=idx[batch_size*j:batch_size*(j+1)]
            sess.run(optimizer,feed_dict={X:train_x[cur_idx],Y:train_y[cur_idx]})
        #current_cost=sess.run(cost,feed_dict={X:train_x,Y:train_y})
        #print(current_cost)
    #apply the network on the test data
    test_x=np.linspace(-30,30,300)
    network_output=sess.run(output_layer,feed_dict={X:test_x.reshape(-1,1)})    



plt.plot(test_x,test_x**2,color='r',label='y=x^2')
plt.plot(test_x,network_output,color='b',label='network output')
plt.legend(loc='center')
plt.show()
like image 173
Miriam Farber Avatar answered Sep 28 '22 08:09

Miriam Farber


Checked the docs for neurolab - newff creates NN with sigmoid transfer function in all neurons by default. Sigmoid value is always in (-1; 1) range, so your output will never leave this range.

Second square (4) is already out of this range, so your code doesn't match your problem at all.

Try using other functions (I'd propose SoftPlus or ReLU). They work quite well with feed-forward networks, allow for backpropagation training (as they are derivable in whole domain) and have values in range (0, ∞), just as you need.

Also: first param to newff defines ranges for input data - you're using [0, 99] which matches all the training data, but doesn't match values that you've tried while testing (since 100 and 101 are bigger than 99). Change this value to something way bigger, so the values you test on are not "special" (meaning "on the end of the range") - I'd propose something like [-300, 300].

Besides, as stated by Seanny123 in a comment, I don't think it's gonna work at all, but with current setup I can be sure of that. Good luck. Let me know (for example in comments) if you succeeded.

Last, but not least - what you're trying to do is extrapolation (figuring out values out of some range based on values in that range). NN are better suited for interpolation (figuring out values in the range based on samples from that range), as they are supposed to generalize data used in training. Try teaching it squares of, for example, every 3rd square (so 1, 16, 49, ...) and then testing by asking for squares of the rest (for example asking for square of 2 or 8).

like image 21
Filip Malczak Avatar answered Sep 28 '22 08:09

Filip Malczak