I have two questions about how to use Tensorflow implementation of the Transformers for text classifications.
Thank you!
The transformer model is able to perform quite well in the task of text classification as we are able to achieve the desired results on most of our predictions.
For tasks in which the text classes are relatively few, the best-performing text classification systems use pretrained Transformer models such as BERT, XLNet, and RoBERTa. But Transformer-based models scale quadratically with the input sequence length and linearly with the number of classes.
Transformers can be used for classification tasks. I found a good tutorial where they used a BERT Transformer for the encoding and a Convolutional Neural Network for a sentiment analysis.
There are two approaches, you can take:
[CLS]
(or whatever you like to call it) and use the hidden state for the special token as input to your classifier.The second approach is used by BERT. When pre-training, the hidden state corresponding to this special token is used for predicting whether two sentences are consecutive. In the downstream tasks, it is also used for sentence classification. However, my experience is that sometimes, averaging the hidden states give a better result.
Instead of training a Transformer model from scratch, it is probably more convenient to use (and eventually finetune) a pre-trained model (BERT, XLNet, DistilBERT, ...) from the transformers package. It has pre-trained models ready to use in PyTorch and TensorFlow 2.0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With