I am looking at taking unstructured data in the form of files, processing it and storing it in a database
for retrieval.
The data will be in natural language and the queries to get information will also be in natural language.
Ex: the data could be "Roses are red" and the query could be "What is the color of a rose?"
I have looked at several nlp
systems, focusing more on open-source
information extraction and relation extraction system and the following seems apt and easy for quick start:
https://www.npmjs.com/package/mitie
This can give data in the form of (word,type) pairs. It also gives a relation as result of running the the processing (check the site example).
I want to know if sql
is good database
to save this information. For retrieving the information, I will need to convert the natural language query also to some kind of (word, meaning) pairs
and for using sql
I will have to write a layer that converts natural language to sql
queries.
Please suggest if there are any open source database
that work well in this situation. I'm open to suggestions for databases that work with other open-source
information extraction and relation extraction systems if not MITIE.
There are many NLP engines available in the market right from Google's Dialogflow (previously known as API.ai), Wit.ai, Watson Conversation Service, Lex and more. Some services provide an all in one solution while some focus on resolving one single issue.
Comprehending natural language text, with its first-hand challenges of ambiguity and co-reference, has been a longstanding problem in Natural Language Processing (NLP).
Natural Language Processing (NLP) Natural language processing strives to build machines that understand and respond to text or voice data—and respond with text or speech of their own—in much the same way humans do.
SQL wont be an appropriate choice for your problem. You can use NLP or rules to extract relationships and then store that relationship in a Triple Store or a Graph database. There are many good open source Graph Databases like Neo4j and Apache Titan. You can query Google for Triple-stores, I suppose Apache Jena should be a good choice. After storing your data you can query your graphs using any of the Graph Query Languages like Gremlin or Cypher etc. (like SQL). Note that the heart of your system would be a Knowledge Graph.
You may also setup a Lucene/Solr based Search System on your unstructured data which may help you with answering your queries in conjunction with Graph Databases. All of these (NLP, IR, Graph DB/Triplestores etc.) would coexist to solve your problem.
It would be like an ensemble. No silver bullets :) However to start with look at Graph DB's or Triple-stores.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With