Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow - matmul of input matrix with batch data

I have some data represented by input_x. It is a tensor of unknown size (should be inputted by batch) and each item there is of size n. input_x undergoes tf.nn.embedding_lookup, so that embed now has dimensions [?, n, m] where m is the embedding size and ? refers to the unknown batch size.

This is described here:

input_x = tf.placeholder(tf.int32, [None, n], name="input_x")  embed = tf.nn.embedding_lookup(W, input_x) 

I'm now trying to multiply each sample in my input data (which is now expanded by embedding dimension) by a matrix variable, U, and I can't seem to get how to do that.

I first tried using tf.matmul but it gives an error due to mismatch in shapes. I then tried the following, by expanding the dimension of U and applying batch_matmul (I also tried the function from tf.nn.math_ops., the result was the same):

U = tf.Variable( ... )     U1 = tf.expand_dims(U,0) h=tf.batch_matmul(embed, U1) 

This passes the initial compilation, but then when actual data is applied, I get the following error:

In[0].dim(0) and In[1].dim(0) must be the same: [64,58,128] vs [1,128,128]

I also know why this is happening - I replicated the dimension of U and it is now 1, but the minibatch size, 64, doesn't fit.

How can I do that matrix multiplication on my tensor-matrix input correctly (for unknown batch size)?

like image 210
yoki Avatar asked Jul 06 '16 23:07

yoki


1 Answers

Previous answers are obsolete. Currently tf.matmul() support tensors with rank > 2:

The inputs must be matrices (or tensors of rank > 2, representing batches of matrices), with matching inner dimensions, possibly after transposition.

Also tf.batch_matmul() was removed and tf.matmul() is the right way to do batch multiplication. The main idea can be understood from the following code:

import tensorflow as tf batch_size, n, m, k = 10, 3, 5, 2 A = tf.Variable(tf.random_normal(shape=(batch_size, n, m))) B = tf.Variable(tf.random_normal(shape=(batch_size, m, k))) tf.matmul(A, B) 

Now you will receive a tensor of the shape (batch_size, n, k). Here is what is going on here. Assume you have batch_size of matrices nxm and batch_size of matrices mxk. Now for each pair of them you calculate nxm X mxk which gives you an nxk matrix. You will have batch_size of them.

Notice that something like this is also valid:

A = tf.Variable(tf.random_normal(shape=(a, b, n, m))) B = tf.Variable(tf.random_normal(shape=(a, b, m, k))) tf.matmul(A, B) 

and will give you a shape (a, b, n, k)

like image 82
Salvador Dali Avatar answered Sep 30 '22 03:09

Salvador Dali