I am going through tensorflow tutorial tensorflow. I would like to find description of the following line:
tf.contrib.layers.embedding_column
I wonder if it uses word2vec or anything else, or maybe I am thinking in completely wrong direction. I tried to click around on GibHub, but found nothing. I am guessing looking on GitHub is not going to be easy, since python might refer to some C++ libraries. Could anybody point me in the right direction?
I've been wondering about this too. It's not really clear to me what they're doing, but this is what I found.
In the paper on wide and deep learning, they describe the embedding vectors as being randomly initialized and then adjusted during training to minimize error.
Normally when you do embeddings, you take some arbitrary vector representation of the data (such as one-hot vectors) and then multiply it by a matrix that represents the embedding. This matrix can be found by PCA or while training by something like t-SNE or word2vec.
The actual code for the embedding_column is here, and it's implemented as a class called _EmbeddingColumn which is a subclass of _FeatureColumn. It stores the embedding matrix inside its sparse_id_column attribute. Then, the method to_dnn_input_layer applies this embedding matrix to produce the embeddings for the next layer.
def to_dnn_input_layer(self,
input_tensor,
weight_collections=None,
trainable=True):
output, embedding_weights = _create_embedding_lookup(
input_tensor=self.sparse_id_column.id_tensor(input_tensor),
weight_tensor=self.sparse_id_column.weight_tensor(input_tensor),
vocab_size=self.length,
dimension=self.dimension,
weight_collections=_add_variable_collection(weight_collections),
initializer=self.initializer,
combiner=self.combiner,
trainable=trainable)
So as far as I can see, it seems like the embeddings are formed by applying whatever learning rule you're using (gradient descent, etc.) to the embedding matrix.
I had a similar doubt about embeddings.
Here is the main point:
The ability of adding an embedding layer along with tradition wide linear models allows for accurate predictions by reducing sparse dimensionality down to low dimensionality.
Here is a good post about it!
And here is a simple example combining embedding layers. Using the Titanic Kaggle data to predict whether or not the passenger will survive based on certain attributes like Name, Sex, what ticket they had, the fare they paid the cabin they stayed in etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With