Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which database can be used to store processed data from NLP engine

I am looking at taking unstructured data in the form of files, processing it and storing it in a database for retrieval. The data will be in natural language and the queries to get information will also be in natural language. Ex: the data could be "Roses are red" and the query could be "What is the color of a rose?"

I have looked at several nlp systems, focusing more on open-source information extraction and relation extraction system and the following seems apt and easy for quick start: https://www.npmjs.com/package/mitie

This can give data in the form of (word,type) pairs. It also gives a relation as result of running the the processing (check the site example).

I want to know if sql is good database to save this information. For retrieving the information, I will need to convert the natural language query also to some kind of (word, meaning) pairs and for using sql I will have to write a layer that converts natural language to sql queries.

Please suggest if there are any open source database that work well in this situation. I'm open to suggestions for databases that work with other open-source information extraction and relation extraction systems if not MITIE.

like image 456
Swati Pardeshi Avatar asked May 24 '17 07:05

Swati Pardeshi


People also ask

Which of these are NLP engines?

There are many NLP engines available in the market right from Google's Dialogflow (previously known as API.ai), Wit.ai, Watson Conversation Service, Lex and more. Some services provide an all in one solution while some focus on resolving one single issue.

What is NLP in SQL?

Comprehending natural language text, with its first-hand challenges of ambiguity and co-reference, has been a longstanding problem in Natural Language Processing (NLP).

What is data in natural language processing?

Natural Language Processing (NLP) Natural language processing strives to build machines that understand and respond to text or voice data—and respond with text or speech of their own—in much the same way humans do.


1 Answers

SQL wont be an appropriate choice for your problem. You can use NLP or rules to extract relationships and then store that relationship in a Triple Store or a Graph database. There are many good open source Graph Databases like Neo4j and Apache Titan. You can query Google for Triple-stores, I suppose Apache Jena should be a good choice. After storing your data you can query your graphs using any of the Graph Query Languages like Gremlin or Cypher etc. (like SQL). Note that the heart of your system would be a Knowledge Graph.

You may also setup a Lucene/Solr based Search System on your unstructured data which may help you with answering your queries in conjunction with Graph Databases. All of these (NLP, IR, Graph DB/Triplestores etc.) would coexist to solve your problem.

It would be like an ensemble. No silver bullets :) However to start with look at Graph DB's or Triple-stores.

like image 165
Yavar Avatar answered Nov 15 '22 08:11

Yavar