(I'm following this pytorch tutorial about BERT word embeddings, and in the tutorial the author is access the intermediate layers of the BERT model.) What I want is to access the last, lets say, 4 last layers of a single input token of the BERT model in TensorFlow2 using HuggingFace's Transformers library. Because each layer outputs a vector of length 768, so the last 4 layers will have a shape of <code>4*768=3072</code> (for each token). How can I implement this in TF/keras/TF2, to get the intermediate layers of pretrained model for an input token? (later I will try to get the tokens for each token in a sentence, but for now one token is enough). I'm using the HuggingFace's BERT model: <pre class="prettyprint"><code>!pip install transformers from transformers import (TFBertModel, BertTokenizer) bert_model = TFBertModel.from_pretrained("bert-base-uncased") # Automatically loads the config bert_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") sentence_marked = "hello" tokenized_text = bert_tokenizer.tokenize(sentence_marked) indexed_tokens = bert_tokenizer.convert_tokens_to_ids(tokenized_text) print (indexed_tokens) >> prints [7592] </code></pre> The output is a token (<code>[7592]</code>), which should be the input of the for the BERT model.

The third element of the BERT model's output is a tuple which consists of output of embedding layer as well as the intermediate layers hidden states. From documentation: <blockquote> hidden_states (<code>tuple(tf.Tensor)</code>, optional, returned when <code>config.output_hidden_states=True</code>): tuple of <code>tf.Tensor</code> (one for the output of the embeddings + one for the output of each layer) of shape <code>(batch_size, sequence_length, hidden_size)</code>. Hidden-states of the model at the output of each layer plus the initial embedding outputs. </blockquote> For the <code>bert-base-uncased</code> model, the <code>config.output_hidden_states</code> is by default <code>True</code>. Therefore, to access hidden states of the 12 intermediate layers, you can do the following: <pre class="prettyprint"><code>outputs = bert_model(input_ids, attention_mask) hidden_states = outputs[2][1:] </code></pre> There are 12 elements in <code>hidden_states</code> tuple corresponding to all the layers from beginning to the last, and each of them is an array of shape <code>(batch_size, sequence_length, hidden_size)</code>. So, for example, to access the hidden state of third layer for the fifth token of all the samples in the batch, you can do: <code>hidden_states[2][:,4]</code>. <hr> Note that if the model you are loading does not return the hidden states by default, then you can load the config using <code>BertConfig</code> class and pass <code>output_hidden_state=True</code> argument, like this: <pre class="prettyprint"><code>config = BertConfig.from_pretrained("name_or_path_of_model", output_hidden_states=True) bert_model = TFBertModel.from_pretrained("name_or_path_of_model", config=config) </code></pre>

How to get intermediate layers' output of pre-trained BERT model in HuggingFace Transformers library?

Tags:

tensorflow

keras

tensorflow2.0

bert-language-model

huggingface-transformers

(I'm following this pytorch tutorial about BERT word embeddings, and in the tutorial the author is access the intermediate layers of the BERT model.)

What I want is to access the last, lets say, 4 last layers of a single input token of the BERT model in TensorFlow2 using HuggingFace's Transformers library. Because each layer outputs a vector of length 768, so the last 4 layers will have a shape of 4*768=3072 (for each token).

How can I implement this in TF/keras/TF2, to get the intermediate layers of pretrained model for an input token? (later I will try to get the tokens for each token in a sentence, but for now one token is enough).

I'm using the HuggingFace's BERT model:

!pip install transformers
from transformers import (TFBertModel, BertTokenizer)

bert_model = TFBertModel.from_pretrained("bert-base-uncased")  # Automatically loads the config
bert_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
sentence_marked = "hello"
tokenized_text = bert_tokenizer.tokenize(sentence_marked)
indexed_tokens = bert_tokenizer.convert_tokens_to_ids(tokenized_text)

print (indexed_tokens)
>> prints [7592]

The output is a token ([7592]), which should be the input of the for the BERT model.

462

asked Apr 27 '20 17:04

Yagel

1 Answers

The third element of the BERT model's output is a tuple which consists of output of embedding layer as well as the intermediate layers hidden states. From documentation:

hidden_states (tuple(tf.Tensor), optional, returned when config.output_hidden_states=True): tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

For the bert-base-uncased model, the config.output_hidden_states is by default True. Therefore, to access hidden states of the 12 intermediate layers, you can do the following:

outputs = bert_model(input_ids, attention_mask)
hidden_states = outputs[2][1:]

There are 12 elements in hidden_states tuple corresponding to all the layers from beginning to the last, and each of them is an array of shape (batch_size, sequence_length, hidden_size). So, for example, to access the hidden state of third layer for the fifth token of all the samples in the batch, you can do: hidden_states[2][:,4].

Note that if the model you are loading does not return the hidden states by default, then you can load the config using BertConfig class and pass output_hidden_state=True argument, like this:

config = BertConfig.from_pretrained("name_or_path_of_model",
                                    output_hidden_states=True)

bert_model = TFBertModel.from_pretrained("name_or_path_of_model",
                                         config=config)

answered Oct 18 '22 08:10

today

Related questions
                            
                                Allocate only one gpu to Keras (TF backend) script
                            
                                TensorFlow Dataset `.map` - Is it possible to ignore errors?
                            
                                Tensorboard eval.py IOU for object detection
                            
                                How to update Protobuf Runtime Library?
                            
                                How to check if tensorflow is using all available GPU's
                            
                                How to compute Spearman correlation in Tensorflow
                            
                                Why does Tensorboard not refresh when used with rsync?
                            
                                Keras: how to disable resizing of images when using an ImageDataGenerator with flow_from_dataframe / flow_from_directory?
                            
                                What is "valency", with regards to machine learning?
                            
                                How to add/change names of components to an existing Tensorflow Dataset object?
                            
                                how do I implement Gaussian blurring layer in Keras?
                            
                                What is the difference between TF 2.0 Alpha and TF nightly preview?
                            
                                unable to import Metric from tensorflow.keras.metrics
                            
                                How can I convert an image from pixels to one-hot encodings?
                            
                                Tensorflow 2.0 , replace 0 values in a tensor with 1s
                            
                                How to access sample weights in a Keras custom loss function supplied by a generator?
                            
                                CancelledError: [_Derived_]RecvAsync is cancelled
                            
                                WARNING:tensorflow:Layer my_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2
                            
                                How to accumulate gradients in tensorflow 2.0?
                            
                                Why plot_model in Keras does not plot the model correctly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With