We know that BERT has a max length limit of tokens = 512, So if an article has a length of much bigger than 512, such as 10000 tokens in text How can BERT be used?
BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. The most natural ways to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels.
Further, usage of BERT is not limited to text or sentence classification but can also be applied to advanced Natural Language Processing applications such as next sentence prediction, question answering, or Named-Entity-Recognition tasks.
The BERT block accepts any integer input size from 3 to 512. For the best performance, use the smallest size that does not result in your text being outrageously cut (this is difficult to estimate).
BERT also has the same limit of 512 tokens. Normally, for longer sequences, you just truncate to 512 tokens. The limit is derived from the positional embeddings in the Transformer architecture, for which a maximum length needs to be imposed.
You have basically three options:
I would suggest to try option 1, and only if this is not good enough to consider the other options.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With