HuggingFace BERT `inputs_embeds` giving unexpected result

Tags:

The HuggingFace BERT TensorFlow implementation allows us to feed in a precomputed embedding in place of the embedding lookup that is native to BERT. This is done using the model's call method's optional parameter inputs_embeds (in place of input_ids). To test this out, I wanted to make sure that if I did feed in BERT's embedding lookup, I would get the same result as having fed in the input_ids themselves.

The result of BERT's embedding lookup can be obtained by setting the BERT configuration parameter output_hidden_states to True and extracting the first tensor from the last output of the call method. (The remaining 12 outputs correspond to each of the 12 hidden layers.)

Thus, I wrote the following code to test my hypothesis:

import tensorflow as tf
from transformers import BertConfig, BertTokenizer, TFBertModel

bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

input_ids = tf.constant(bert_tokenizer.encode("Hello, my dog is cute", add_special_tokens=True))[None, :]
attention_mask = tf.stack([tf.ones(shape=(len(sent),)) for sent in input_ids])
token_type_ids = tf.stack([tf.ones(shape=(len(sent),)) for sent in input_ids])

config = BertConfig.from_pretrained('bert-base-uncased', output_hidden_states=True)
bert_model = TFBertModel.from_pretrained('bert-base-uncased', config=config)

result = bert_model(inputs={'input_ids': input_ids, 
                            'attention_mask': attention_mask, 
                             'token_type_ids': token_type_ids})
inputs_embeds = result[-1][0]
result2 = bert_model(inputs={'inputs_embeds': inputs_embeds, 
                            'attention_mask': attention_mask, 
                             'token_type_ids': token_type_ids})

print(tf.reduce_sum(tf.abs(result[0] - result2[0])))  # 458.2522, should be 0

Again, the output of the call method is a tuple. The first element of this tuple is the output of the last layer of BERT. Thus, I expected result[0] and result2[0] to match. Why is this not the case?

I am using Python 3.6.10 with tensorflow version 2.1.0 and transformers version 2.5.1.

EDIT: Looking at some of the HuggingFace code, it seems that the raw embeddings that are looked up when input_ids is given or assigned when inputs_embeds is given are added to the positional embeddings and token type embeddings before being fed into subsequent layers. If this is the case, then it may be possible that what I'm getting from result[-1][0] is the raw embedding plus the positional and token type embeddings. This would mean that they are erroneously getting added in again when I feed result[-1][0] as inputs_embeds in order to calculate result2.

Could someone please tell me if this is the case and if so, please explain how to get the positional and token type embeddings, so I can subtract them out? Below is what I came up with for positional embeddings based on the equations given here (but according to the BERT paper, the positional embeddings may actually be learned, so I'm not sure if these are valid):

import numpy as np

positional_embeddings = np.stack([np.zeros(shape=(len(sent),768)) for sent in input_ids])
for s in range(len(positional_embeddings)):
    for i in range(len(positional_embeddings[s])):
        for j in range(len(positional_embeddings[s][i])):
            if j % 2 == 0:
                positional_embeddings[s][i][j] = np.sin(i/np.power(10000., j/768.))
            else:
                positional_embeddings[s][i][j] = np.cos(i/np.power(10000., (j-1.)/768.))
positional_embeddings = tf.constant(positional_embeddings)
inputs_embeds += positional_embeddings

751

asked May 02 '20 23:05

Vivek Subramanian

1 Answers

My intuition about positional and token type embeddings being added in turned out to be correct. After looking closely at the code, I replaced the line:

inputs_embeds = result[-1][0]

with the lines:

embeddings = bert_model.bert.get_input_embeddings().word_embeddings
inputs_embeds = tf.gather(embeddings, input_ids)

Now, the difference is 0.0, as expected.

answered Sep 27 '22 17:09

Vivek Subramanian

Related questions
                            
                                Image in Jupyter Notebook ipynb doesn't show up in GitHub private repo but the same code works with public repo
                            
                                How to fix "module 'tensorflow' has no attribute 'estimator' " error
                            
                                Connection was closed in the middle of operation when accesing database using Python
                            
                                Tensorflow: Modern way to load large data
                            
                                tqdm and numpy vectorize
                            
                                Since latest python version retains insertion order of dict,will the meaning of equality (==) change?
                            
                                Absolute paths after freezing with cx_freeze (Qt5 / PySide2 App)
                            
                                How to establish TLS session in python using PKCS11
                            
                                How to plot predicted values vs the true value?
                            
                                Stop/fail docker build if tests fail
                            
                                GradienTape convergence much slower than Keras.model.fit
                            
                                Is there an equivalent of `sum()` builtin which uses augmented assignment?
                            
                                Strange performance results -- loop vs list comprehension and zip()
                            
                                Flask socket IO emit from another module
                            
                                Slicing arrays with lists
                            
                                Incorrect results with `annotate` + `values` + `union` in Django
                            
                                Merging csv files with different headers with Pandas in Python
                            
                                Error loading Python lib with PyInstaller on MacOS
                            
                                "NULL identity key" error using SQLAlchemy's base automap to reflect a postgres DB using IDENTITY columns
                            
                                tf.data: Parallelize loading step

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

HuggingFace BERT `inputs_embeds` giving unexpected result

Tags:

python

tensorflow

nlp

bert-language-model

huggingface-transformers

Vivek Subramanian

People also ask

1 Answers

Vivek Subramanian

Recent Activity

Donate For Us