Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Bert for long text classification?

We know that BERT has a max length limit of tokens = 512, So if an article has a length of much bigger than 512, such as 10000 tokens in text How can BERT be used?

like image 317
user1337896 Avatar asked Oct 31 '19 03:10

user1337896


People also ask

How do you deal with long texts in BERT?

BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. The most natural ways to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels.

Can I use BERT for text classification?

Further, usage of BERT is not limited to text or sentence classification but can also be applied to advanced Natural Language Processing applications such as next sentence prediction, question answering, or Named-Entity-Recognition tasks.

What is Max length in BERT?

The BERT block accepts any integer input size from 3 to 512. For the best performance, use the smallest size that does not result in your text being outrageously cut (this is difficult to estimate).

Why is BERT 512 limited?

BERT also has the same limit of 512 tokens. Normally, for longer sequences, you just truncate to 512 tokens. The limit is derived from the positional embeddings in the Transformer architecture, for which a maximum length needs to be imposed.


1 Answers

You have basically three options:

  1. You cut the longer texts off and only use the first 512 Tokens. The original BERT implementation (and probably the others as well) truncates longer sequences automatically. For most cases, this option is sufficient.
  2. You can split your text in multiple subtexts, classifier each of them and combine the results back together ( choose the class which was predicted for most of the subtexts for example). This option is obviously more expensive.
  3. You can even feed the output token for each subtext (as in option 2) to another network (but you won't be able to fine-tune) as described in this discussion.

I would suggest to try option 1, and only if this is not good enough to consider the other options.

like image 199
chefhose Avatar answered Oct 15 '22 19:10

chefhose