I have some efficiency issue using the tensorflow function py_func.
Context
In my project, I have a batch of tensor input_features
of size [? max_items m]
. The first dimension is set to ?
because it is a dynamic shape (the batch is read for a custom tensorflow reader, and shuffled using tf.train.shuffle_batch_join()). The second dimension correspond to an upper bound (the maximum number of items I can take for my example), the third dimension corresponds to the feature dimension space. I also have a tensor num_items
that has dimension of batch size (so the shape is (?,)
), indicating the number of items in the example, other are set to 0 (in a numpy writing style input_feature[k, num_items[k]:, :] = 0
)
Issue
My workflow need some custom python operations (especially for dealing with indexing, I need or instance to perform clustering operations on some chunk of examples) and I use a few numpy function wrapped in py_func
function. This works well, but training becomes very very slow (around 50 times slower than a model without this py_func), and the function itself is not time consuming.
Questions
1 - Is this computing time increase normal? The function wrapped in py_func
gives me a new tensor that is multiplied further in the process. Does it explain the computing time? (I mean gradient may be more difficult to compute with such function).
2 - I'm trying to modify my processing and avoid using py_func
function. However, it was very handy for extracting data with numpy indexing (especially with my data formatting), and I have some difficulties to pass it in a TF way. For instance, if I have a tensor t1
with shape[-1, n_max, m]
(first dimension is batch_size which is dynamic) and t2
with shape [-1,2]
containing integers. Is there an easy way to perform mean operation in tensorflow that will results in t_mean_chunk
with shape (-1, m)
where (in a numpy formulation) :
t_mean_chunk[i,:] = np.mean(t1[i, t2[i,0]:t2[i,1], :], axis=0)
?
This was (among others operations) the kind of things I was doing in wrapped function.
Question 1 is hard to answer without the exact py_func, but as hpaulj mentioned in his comment, it's not too surprising that it's slowing things down. As a worst-case fallback, tf.scan
or tf.while_loop
with a TensorArray
may be somewhat faster. However, the best case is to have a vectorized solution with TensorFlow ops, which I think is possible in this case.
As for question 2, I'm not sure if it counts as easy, but here's a function which computes your indexing expression:
import tensorflow as tf
def range_mean(index_ranges, values):
"""Take the mean of `values` along ranges specified by `index_ranges`.
return[i, ...] = tf.reduce_mean(
values[i, index_ranges[i, 0]:index_ranges[i, 1], ...], axis=0)
Args:
index_ranges: An integer Tensor with shape [N x 2]
values: A Tensor with shape [N x M x ...].
Returns:
A Tensor with shape [N x ...] containing the means of `values` having
indices in the ranges specified.
"""
m_indices = tf.range(tf.shape(values)[1])[None]
# Determine which parts of `values` will be in the result
selected = tf.logical_and(tf.greater_equal(m_indices, index_ranges[:, :1]),
tf.less(m_indices, index_ranges[:, 1:]))
n_indices = tf.tile(tf.range(tf.shape(values)[0])[..., None],
[1, tf.shape(values)[1]])
segments = tf.where(selected, n_indices + 1, tf.zeros_like(n_indices))
# Throw out segment 0, since that's our "not included" segment
segment_sums = tf.unsorted_segment_sum(
data=values,
segment_ids=segments,
num_segments=tf.shape(values)[0] + 1)[1:]
divisor = tf.cast(index_ranges[:, 1] - index_ranges[:, 0],
dtype=values.dtype)
# Pad the shape of `divisor` so that it broadcasts against `segment_sums`.
divisor_shape_padded = tf.reshape(
divisor,
tf.concat([tf.shape(divisor),
tf.ones([tf.rank(values) - 2], dtype=tf.int32)], axis=0))
return segment_sums / divisor_shape_padded
Example usage:
index_range_tensor = tf.constant([[2, 4], [1, 6], [0, 3], [0, 9]])
values_tensor = tf.reshape(tf.range(4 * 10 * 5, dtype=tf.float32), [4, 10, 5])
with tf.Session():
tf_result = range_mean(index_range_tensor, values_tensor).eval()
index_range_np = index_range_tensor.eval()
values_np = values_tensor.eval()
for i in range(values_np.shape[0]):
print("Slice {}: ".format(i),
tf_result[i],
numpy.mean(values_np[i, index_range_np[i, 0]:index_range_np[i, 1], :],
axis=0))
Prints:
Slice 0: [ 12.5 13.5 14.5 15.5 16.5] [ 12.5 13.5 14.5 15.5 16.5]
Slice 1: [ 65. 66. 67. 68. 69.] [ 65. 66. 67. 68. 69.]
Slice 2: [ 105. 106. 107. 108. 109.] [ 105. 106. 107. 108. 109.]
Slice 3: [ 170. 171. 172. 173. 174.] [ 170. 171. 172. 173. 174.]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With