Fine-tune Bert for specific domain (unsupervised)

Tags:

I want to fine-tune BERT on texts that are related to a specific domain (in my case related to engineering). The training should be unsupervised since I don't have any labels or anything. Is this possible?

351

asked Nov 06 '20 09:11

spadel

1 Answers

What you in fact want to is continue pre-training BERT on text from your specific domain. What you do in this case is to continue training the model as masked language model, but on your domain-specific data.

You can use the run_mlm.py script from the Huggingface's Transformers.

153

answered Oct 20 '22 06:10

Jindřich

Related questions
                            
                                Getting flake8 returned a non none zero code : 1 in docker
                            
                                Pytorch: IndexError: index out of range in self. How to solve?
                            
                                Compressing list[0], list[1], list[2],... into a simple statement
                            
                                Find the substring avoiding the use of recursive function
                            
                                Why is Python's built-in sum much slower than manual summation?
                            
                                Generate video from numpy arrays with openCV
                            
                                Replace a list of characters with indices in a string in python
                            
                                On a django site I am getting socket cluster error
                            
                                How do you make pylint in VSCode know that it's in a package (so that relative imports work)?
                            
                                Python: Dynamically create class while providing arguments to __init__subclass__()
                            
                                Calculate intersection over union (Jaccard's index) in pandas dataframe
                            
                                botocore.exceptions.SSLError: SSL validation failed on WIndows
                            
                                Have unique index value in Pandas DataFrame
                            
                                Where should I put abstract classes in a python package?
                            
                                What shebang should I use to consistently point to python3?
                            
                                Get starlette request body in the middleware context
                            
                                Replace a pandas column by splitting the text based on "_"
                            
                                Add missing rows based on column
                            
                                How to make text processing in a pandas df column more faster for large textual data?
                            
                                InvalidArgumentError: Specified a list with shape [60,9] from a tensor with shape [56,9]

Fine-tune Bert for specific domain (unsupervised)

Tags:

python

neural-network

deep-learning

nlp

bert-language-model

spadel

People also ask

1 Answers

Jindřich

Recent Activity

Donate For Us