I’m trying to build a neural network that takes as inputs the vertices position of a 3d mesh, and outputs the coordinates of two points on the inside.
for testing purpose I have a dataset containing a geometry with 20 points and two points on the inside for each one.
Every file of the dataset contains the coordinates of the vertices in a rank 2 with shape [3,20] array for the objs and shape [3,3] for the resulting points.
I’ve built a linear model, but the outcome is always very low (0,16) , doesn’t matter if I train it with 1000, 100.000 or 500.000
import tensorflow as tf
import numpy as np
objList = np.load('../testFullTensors/objsArray_00.npy')
guideList = np.load('..testFullTensors/drvsArray_00.npy')
x = tf.placeholder(tf.float32, shape=[None, 60])
y_ = tf.placeholder(tf.float32, shape=[None, 6])
W = tf.Variable(tf.zeros([60,6],tf.float32))
b = tf.Variable(tf.zeros([6],tf.float32))
y = tf.matmul(x,W) + b
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_step.run(feed_dict={x: objList, y_: guideList})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
print accuracy.eval(session=sess , feed_dict={x: objs, y_: guides})`
should I build a different kind of network?
Thanks E
First, thanks for the clarification of the question in the comments, it really helps understand the problem.
The problem as I understand it is (at least similar to) : given a bounding set of 3D points of the outside of an arm, identify
What we need is a model that has enough expressivity to be able to do this. Let us consider how this problem is easiest for a human first. If a human was given a 3D model that they could look at and rotate then it would be a visual problem and they would probably get it right away.
If it was a list of 60 numbers and they were not told what those numbers meant and they had to product 6 numbers as an answer then it may not be possible.
We know that TensorFlow is good at image recognition, so let's turn the problem into an image recognition problem.
Let's just start with the MNIST network and talk about what it would take to change it to our problem!
Convert your input to voxels such that each training example will be one 3D image of size [m,m,m] where m is the resolution you need (start with 30 or so for initial testing and maybe go as high as 128). Initialize your 3D matrix with 0's. Then for each of the 20 data points change the corresponding voxel to 1 (or a probability).
That is you input, and since you have lots of training examples you will have a tensor of [batch,m,m,m].
Do the same for your expected output.
Send that through layers of convolution (start with 2 or 3 for testing) such that your output size is [batch,m,m,m].
Use back propagation to train your output layer to predict your expected output.
Finally you will have a network that doesn't return a 3D coordinate of the Humerus but instead returns a probability graph of where it is in 3D space. You can scan the output for the highest probabilities and read off the coordinates.
This is very similar to how AlphaGo is beating Go.
suggested improvement - train 1 network to predict A and a separate network to predict B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With