Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorFlow - Pad unknown size tensor to a specific size?

Tags:

Is there a way to pad a tensor of variable size to a given shape with a specific pad value? For example given the tensors:

[[1, 2],
 [3, 4]]

and

[[1, 2, 3],
 [4, 5, 6]]

Is there a way to have a generic operation which would take either and pad them with a value (say, to shape [2, 4] with value -1) to result in:

[[1, 2, -1, -1],
 [3, 4, -1, -1]]

and

[[1, 2, 3, -1],
 [4, 5, 6, -1]]

respectively? My reasoning (in case there is a better solution) is that I have examples from a TFRecords file, part of which has a variable length. For processing, a static length makes them easier to work with.

like image 614
golmschenk Avatar asked Feb 20 '17 00:02

golmschenk


2 Answers

Yes. There is. Provided you do not need to change the rank of the tensor, it's very simple.

tf.pad() accepts regular python lists with tensors. The format of the padding is a list of pairs of how much to pad on each side of that dimension.

e.g.

t = tf.constant([[1, 2], [3, 4]])
paddings = [[0, 0], [0, 4-tf.shape(t)[0]]]
out = tf.pad(t, paddings, 'CONSTANT', constant_values=-1)
sess.run(out)
# gives: 
# array([[ 1,  2, -1, -1],
#       [ 3,  4, -1, -1]], dtype=int32)

If you want to generalise this to a useful function, you could do something like:

def pad_up_to(t, max_in_dims, constant_values):
    s = tf.shape(t)
    paddings = [[0, m-s[i]] for (i,m) in enumerate(max_in_dims)]
    return tf.pad(t, paddings, 'CONSTANT', constant_values=constant_values)

where max_in_dims is essentially the desired shape of the output. Note: this function will fail if you provide a shape that is strictly smaller than t in any dimension.

You can use it like:

t = tf.constant([[1, 2], [3, 4]]) # shape = [2, 2]
t_padded = pad_up_to(t, [2, 4], -1) # shape = [2, 4], padded with -1s

or

t = tf.placeholder(tf.float32, [None, None]) # shape = [?, ?]
t_padded = pad_up_to(t, [5,5], -1) # shape = [5, 5], padded with -1s
t_np = np.random.uniform(0, 1, [3,4]) # shape = [3,4], no padding
t_padded_out = sess.run(t_padded, {t: t_np})
t_np2 = np.random.uniform(0, 1, [2,1]) # shape = [2,1], no padding
t_padded_out2 = sess.run(t_padded, {t: t_np2})

Although the dimension sizes are calculated dynamically, the number of dimensions is not, so make sure that max_in_dims has the same number of elements as t.shape.

like image 77
Multihunter Avatar answered Oct 16 '22 05:10

Multihunter


An extension of Multihunter's solution so that padding is only performed when necessary and does not yield an error for longer inputs:

Suppose we have a sequential input called inp_seq, which is a tensor of rank 4 and should be padded in order to have a minimum length of filter_size in dimension 1.

def dynamic_padding(inp, min_size):

    pad_size = min_size - tf.shape(inp)[1]
    paddings = [[0, 0], [0, pad_size], [0, 0], [0, 0]] # assign here, during graph execution
    return tf.pad(inp, paddings)

# Pad only if necessary
padded = tf.cond(tf.less(tf.shape(inp_seq)[1], filter_size), true_fn=lambda: dynamic_padding(inp_seq, filter_size), false_fn=lambda: inp_seq)  
like image 34
Ataxias Avatar answered Oct 16 '22 06:10

Ataxias