Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing/identifying sections in job descriptions

I'm trying to solve quite a difficult problem - building a generic parser for job descriptions. The idea is, given a job description, the parser should be able to identify and extract different sections such as job title, location, job description, responsibilities, qualifications etc. The job description will basically be scraped from a web page.

A rule based approach (such as regular expressions) doesn't work since the scenario is too generic. My next approach was to train a custom NER classifier using SpaCy; I've done this numerous times before. However, I'm running into several problems.

  1. The entities can be very small in size (location, job title etc.) or very large (responsibilities, qualifications etc.). I'm not sure how well NER works if the entities are several lines or a paragraph long? Most of the use cases I've seen are those in which the entities aren't longer than a few words max. Does Spacy's NER work well if the text of the entities I want to identify is quite long in size? (I can give examples if required to make it clearer).

  2. Is there any other strategy besides NER that I can use to parse these job descriptions as I've mentioned?

Any help here would be greatly appreciated. I've been banging my head along different walls for a few months, and I have made some progress, but I'm not sure if I'm on the right track, or if a better approach exists.

like image 382
Azfar Imtiaz Avatar asked Dec 01 '25 21:12

Azfar Imtiaz


1 Answers

I would suggest to build a baseline (rule-based) approach using flashtext. Which actually gives pretty decent and faster results basing on your data. A good feedback mechanism would help in building your sequence tagging model to parse your job descriptions and curate data. Using that data build an NER model using the state of the art library flair

like image 102
Poorna Prudhvi Avatar answered Dec 03 '25 14:12

Poorna Prudhvi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!