Is there any way to convert a string tensor to lower case, without evaluating in the session ? Some sort of tf.string_to_lower
op ?
More specifically, I am reading data from tfrecords
files, so my data is made of tensors. I then want to use tf.contrib.lookup.index_table_from_*
to lookup indices for words in the data, and I need this to be case-insensitive. Lowering the data before writing it to tfrecords
is not an option, as it needs to be kept in original format. One option would be to store both original and lowered, but I'd like to avoid this if possible.
Here's an implementation with tensorflow ops:
def lowercase(s):
ucons = tf.constant_initializer([chr(i) for i in range(65, 91)])
lcons = tf.constant_initializer([chr(i) for i in range(97, 123)])
upchars = tf.constant(ucons, dtype=tf.string)
lchars = tf.constant(lcons, dtype=tf.string)
upcharslut = tf.contrib.lookup.index_table_from_tensor(mapping=upchars, num_oov_buckets=1, default_value=-1)
splitchars = tf.string_split(tf.reshape(s, [-1]), delimiter="").values
upcharinds = upcharslut.lookup(splitchars)
return tf.reduce_join(tf.map_fn(lambda x: tf.cond(x[0] > 25, lambda: x[1], lambda: lchars[x[0]]), (upcharinds, splitchars), dtype=tf.string))
if __name__ == "__main__":
s = "komoDO DragoN "
sess = tf.Session()
x = lowercase(s)
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
print(sess.run([x]))
returns [b'komodo dragon ']
You can use tf.py_func
to use a python function that manipulates your string and it's executed withing the graph.
You can do something like:
# I suppose your string tensor is tensorA
lower = tf.py_func(lambda x: x.lower(), [tensorA], tf.string, stateful=False)
# Starting from TF 2.0 `tf.py_func` is deprecated so correct code will be
lower = tf.py_function(lambda x: x.numpy().lower(), [tensorA], tf.string)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With