How to use BERT pretrain embeddings with my own new dataset?

Question

My dataset and NLP task is very different from the large corpus what authors have pre-trained their model (https://github.com/google-research/bert#pre-training-with-bert), so I can't directly fine-tune. Is there any example code/GitHub that can help me to train BERT with my own data? I expect to get embeddings like glove.

Thank you very much!

Ashwin Geet D'Sa · Accepted Answer

Yes, you can get BERT embeddings, like other word embeddings using extract_features.py script. You have the capability to select the number of layers from which you need the output. Usage is simple, you have to save one sentence per line in a text file and pass it as input. Output will be a JSONL file providing contextual embeddings per token.

The usage of script with documentation is provided at: https://github.com/google-research/bert#using-bert-to-extract-fixed-feature-vectors-like-elmo

How to use BERT pretrain embeddings with my own new dataset?

Tags:

word-embedding

bert-language-model

transfer-learning

BB8

1 Answers

Ashwin Geet D'Sa

Recent Activity

Donate For Us

How to use BERT pretrain embeddings with my own new dataset?

Tags:

word-embedding

bert-language-model

transfer-learning

BB8

1 Answers

Ashwin Geet D'Sa

Related questions

Recent Activity

Donate For Us