Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is dimensionality in word embeddings?

I want to understand what is meant by "dimensionality" in word embeddings.

When I embed a word in the form of a matrix for NLP tasks, what role does dimensionality play? Is there a visual example which can help me understand this concept?

like image 255
manoveg Avatar asked Jul 29 '17 23:07

manoveg


People also ask

What is a good embedding dimension?

If we're in a hurry, one rule of thumb is to use the fourth root of the total number of unique categorical elements while another is that the embedding dimension should be approximately 1.6 times the square root of the number of unique elements in the category, and no less than 600.

What are the dimensions of a word vector?

"Word Vector Dimension" is the dimension of the vector that you have trained with the training document. Technically you can choose any dimension, like 10, 100, 300, even 1000. Industry norm is 300-500 because we have experimented with different dimensions (300, 400, 500, ... 1000, etc.)

What is embedding dimension Tensorflow?

Basically it's a mapping of the original input data into some set of real-valued dimensions, and the "position" of the original input data in those dimensions is organized to improve the task.


2 Answers

Answer

A Word Embedding is just a mapping from words to vectors. Dimensionality in word embeddings refers to the length of these vectors.

Additional Info

These mappings come in different formats. Most pre-trained embeddings are available as a space-separated text file, where each line contains a word in the first position, and its vector representation next to it. If you were to split these lines, you would find out that they are of length 1 + dim, where dim is the dimensionality of the word vectors, and 1 corresponds to the word being represented. See the GloVe pre-trained vectors for a real example.

For example, if you download glove.twitter.27B.zip, unzip it, and run the following python code:

#!/usr/bin/python3

with open('glove.twitter.27B.50d.txt') as f:
    lines = f.readlines()
lines = [line.rstrip().split() for line in lines]

print(len(lines))          # number of words (aka vocabulary size)
print(len(lines[0]))       # length of a line
print(lines[130][0])       # word 130
print(lines[130][1:])      # vector representation of word 130
print(len(lines[130][1:])) # dimensionality of word 130

you would get the output

1193514
51
people
['1.4653', '0.4827', ..., '-0.10117', '0.077996']  # shortened for illustration purposes
50

Somewhat unrelated, but equally important, is that lines in these files are sorted according to the word frequency found in the corpus in which the embeddings were trained (most frequent words first).


You could also represent these embeddings as a dictionary where the keys are the words and the values are lists representing word vectors. The length of these lists would be the dimensionality of your word vectors.

A more common practice is to represent them as matrices (also called lookup tables), of dimension (V x D), where V is the vocabulary size (i.e., how many words you have), and D is the dimensionality of each word vector. In this case you need to keep a separate dictionary mapping each word to its corresponding row in the matrix.

Background

Regarding your question about the role dimensionality plays, you'll need some theoretical background. But in a few words, the space in which words are embedded presents nice properties that allow NLP systems to perform better. One of these properties is that words that have similar meaning are spatially close to each other, that is, have similar vector representations, as measured by a distance metric such as the Euclidean distance or the cosine similarity.

You can visualize a 3D projection of several word embeddings here, and see, for example, that the closest words to "roads" are "highways", "road", and "routes" in the Word2Vec 10K embedding.

For a more detailed explanation I recommend reading the section "Word Embeddings" of this post by Christopher Olah.

For more theory on why using word embeddings, which are an instance of distributed representations, is better than using, for example, one-hot encodings (local representations), I recommend reading the first sections of Distributed Representations by Geoffrey Hinton et al.

like image 108
jabalazs Avatar answered Sep 28 '22 19:09

jabalazs


Textual data has to be converted into numeric data before feeding into any Machine Learning algorithm. Word Embedding is an approach for this where each word is mapped to a vector.

In algebra, A Vector is a point in space with scale & direction. In simpler term Vector is a 1-Dimensional vertical array ( or say a matrix having single column) and Dimensionality is the number of elements in that 1-D vertical array.

Pre-trained word embedding models like Glove, Word2vec provides multiple dimensional options for each word, for instance 50, 100, 200, 300. Each word represents a point in D dimensionality space and synonyms word are points closer to each other. Higher the dimension better shall be the accuracy but computation needs would also be higher.

like image 40
Kaustuv Avatar answered Sep 28 '22 17:09

Kaustuv