Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search for job titles in an article using Spacy or NLTK

I'm new to NLP and recently been playing with NTLK and Spacy. However, I could not find a way to search for job titles (ex: product manager, chief marketing officer, etc) in an article.

Example, I have 1000 articles and I want to get all the articles that have job titles that I am interested in.

Also, what entity type does job titles fall in? I check https://spacy.io/docs/usage/entity-recognition and did not see it in there. I there a plan to add it?

Thanks.

like image 387
user643132 Avatar asked Feb 05 '23 10:02

user643132


1 Answers

"Job Titles" entity is not supported by Spacy NER, as also stated by Nathan. But you can create a custom named entity for your use case. Here is official documentation link. You can find step by step guide to train Spacy NER there.

You would need labeled data to train your NER. Generally you would need atleast 4000-5000 examples for train and 2000 examples for test. The more training data you have, the better will be the NER performance.

Here is some sample training data.

TRAIN_DATA = [
    ('Who is Shaka Khan?', {
        'entities': [(7, 17, 'PERSON')]
    }),
    ('I like London and Berlin.', {
        'entities': [(7, 13, 'LOC'), (18, 24, 'LOC')]
    }),
    ('I work as software engineer.', {
        'entities': [(9, 18, 'JOBTITLE')]
    }),

]
like image 134
joel Avatar answered Mar 20 '23 22:03

joel