Avoiding at most hard-coded rules for specific patterns.
I'm currently working on a similar project as AWS Textract, link here. I've been successful at extracting data from files, but in an unstructured way. Now, i'm trying to figure out, and in the best ways, how to get existing Key-Value Pairs from that bunch of information.
For example we have a text like that :
In this document we will find different key and values like this id : 1 and that country : France with no specific punctuation and probably talking about how good is my health...
The extraction would be something like that :
id : 1
country : France
health : good
What i actually know is that Amazon use a "confidence" variable into extracting information from that kind of scenario, which i guess involve some machine-learning algorithm. In my case, i don't have that big of a database to learn from.
I'm pretty sure that there is an easier solution neither less flexible.
I believe that spaCy library may be the the right tool for your needs. Check out the description on GitHub to figure it out.
It can be exposed to Node JS using spacy-nlp package.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With