I am running a model with a very big word embedding (>2M words). When I use tf.embedding_lookup, it expects the matrix, which is big. When I run, I subsequently get out of GPU memory error. If I reduce the size of the embedding, everything works fine.
Is there a way to deal with larger embedding?
The recommended way is to use a partitioner to shard this large tensor across several parts:
embedding = tf.get_variable("embedding", [1000000000, 20],
partitioner=tf.fixed_size_partitioner(3))
This will split the tensor into 3 shards along 0 axis, but the rest of the program will see it as an ordinary tensor. The biggest benefit is to use a partitioner along with parameter server replication, like this:
with tf.device(tf.train.replica_device_setter(ps_tasks=3)):
embedding = tf.get_variable("embedding", [1000000000, 20],
partitioner=tf.fixed_size_partitioner(3))
The key function here is tf.train.replica_device_setter
.
It allows you to run 3 different processes, called parameter servers, that store all of model variables. The large embedding
tensor will be split across these servers like on this picture.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With