Keras Word2Vec implementation

Tags:

I'm using the implementation found in http://adventuresinmachinelearning.com/word2vec-keras-tutorial/ to learn something about word2Vec. What I am not understanding is why isn't the loss function decreasing?

Iteration 119200, loss=0.7305528521537781
Iteration 119300, loss=0.6254740953445435
Iteration 119400, loss=0.8255964517593384
Iteration 119500, loss=0.7267132997512817
Iteration 119600, loss=0.7213149666786194
Iteration 119700, loss=0.6156617999076843
Iteration 119800, loss=0.11473365128040314
Iteration 119900, loss=0.6617216467857361

The net, from my understanding, is a standard one used in this task:

input_target = Input((1,))
input_context = Input((1,))

embedding = Embedding(vocab_size, vector_dim, input_length=1, name='embedding')

target = embedding(input_target)
target = Reshape((vector_dim, 1))(target)
context = embedding(input_context)
context = Reshape((vector_dim, 1))(context)

dot_product = Dot(axes=1)([target, context])
dot_product = Reshape((1,))(dot_product)
output = Dense(1, activation='sigmoid')(dot_product)

model = Model(inputs=[input_target, input_context], outputs=output)
model.compile(loss='binary_crossentropy', optimizer='rmsprop') #adam??

Words come from a vocabulary of size 10000 from http://mattmahoney.net/dc/text8.zip (english text)

What I notice is that some words are somewhat learned in time like the context for numbers and articles is easily guessed, yet the loss is quite stuck around 0.7 from the beginning, and as iterations goes it only fluctuates randomly.

The training part is made like this (which I sense strange since the absence of the standard fit method)

arr_1 = np.zeros((1,))
arr_2 = np.zeros((1,))
arr_3 = np.zeros((1,))
for cnt in range(epochs):
    idx = np.random.randint(0, len(labels)-1)
    arr_1[0,] = word_target[idx]
    arr_2[0,] = word_context[idx]
    arr_3[0,] = labels[idx]
    loss = model.train_on_batch([arr_1, arr_2], arr_3)
    if cnt % 100 == 0:
        print("Iteration {}, loss={}".format(cnt, loss))

Am i missing something important about these type of net? What is not written is implemented exactly like the link above

888

asked Jun 26 '18 09:06

Marco Pietrosanto

1 Answers

I followed the same tutorial and the loss drops after the algorithm went through a sample again. Note that the loss function is calculated only for the current target and context word pair. In the code example from the tutorial one epoch is only one sample, therefore you would need more than the number of target and context words to come to a point where the loss drops.

I implemented the training part with the following line

model.fit([word_target, word_context], labels, epochs=5)

Be warned that this can take a long time depending on how large the corpus is. The train_on_batch function gives you more control in training and you can vary the batch size or select samples you choose at every step of the training.

170

answered Sep 23 '22 07:09

Nikolai Janakiev

Related questions
                            
                                Filtering or of multiple between in sqlalchemy
                            
                                Correct way to deprecate parameter alias in click
                            
                                Create columns based on unique column values and fill
                            
                                Training Linear Models with MAE using sklearn in Python
                            
                                Duplicate of dataframe but increasing date
                            
                                Boolean mask Tensorflow, tf.boolean_mask - Maintain dimensions of original tensor
                            
                                Add multiple csv in a single csv sheet in tabs using Pandas
                            
                                Pyqtgraph & Changing color base on height for surfaceplot
                            
                                query foreign key table for list view in django
                            
                                How do I export a TensorFlow model as a .tflite file?
                            
                                Rotate x axis labels in Matplotlib parasite plot
                            
                                How to disable opening the page in a new tab in Selenium Webdriver in Python?
                            
                                groupby a column and count items above 5 in another pandas
                            
                                Guaranteeing calling to destruction on process termination
                            
                                ThreadPoolExecutor, ProcessPoolExecutor and global variables
                            
                                Finding weak ties using networkx
                            
                                Python files to an MSI Windows installer
                            
                                python itertools round robin explaintation
                            
                                Why am I getting an invalid syntax error in Python REPL right after IF statement?
                            
                                Dropping rows in pandas with .index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keras Word2Vec implementation

Tags:

python

keras

word2vec

Marco Pietrosanto

People also ask

1 Answers

Nikolai Janakiev

Recent Activity

Donate For Us