My dataset and NLP task is very different from the large corpus what authors have pre-trained their model (https://github.com/google-research/bert#pre-training-with-bert), so I can't directly fine-tune. Is there any example code/GitHub that can help me to train BERT with my own data? I expect to get embeddings like glove.
Thank you very much!
Yes, you can get BERT embeddings, like other word embeddings using extract_features.py script. You have the capability to select the number of layers from which you need the output. Usage is simple, you have to save one sentence per line in a text file and pass it as input. Output will be a JSONL file providing contextual embeddings per token.
The usage of script with documentation is provided at: https://github.com/google-research/bert#using-bert-to-extract-fixed-feature-vectors-like-elmo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With