Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grakn: how can I construct a knowledge graph from a collection of texts?

I have several documents (pdf and txt) in my notebook and I want to construct a knowledge graph using Grakn.

Through Google I found the blog but there is no documentation or readme teaching how to do that.

Also is written in the blog "The script to mine text can be found on our GitHub repo here" but I am failing in understanding what I have to do.

Can someone here advise me how to construct a knowledge graph from text using Grakn?

like image 540
R. S. Avatar asked Mar 26 '20 13:03

R. S.


2 Answers

Grakn is a knowledge engine/network, which understands knowledge by well defined entities and relations (ontologies), so you need to use NLP (Natural Language processing) to make human language accessible to a graph network. also you need OCR (Optical Character Recognition) to convert some image texts to text. also you should teach the network basic ontologies to understand the texts. you are actually heading through Singularity era.

like image 105
Aref Riant Avatar answered Oct 19 '22 12:10

Aref Riant


To give an example of how to go from a collection of text to a knowledge graph, let us assume that all of your text is concerned with a certain domain of knowledge - in the example of the blog post you mention, we are dealing with biomedical research publications.

A first step could be to find entities, or defined "things", in the text. To stick with the biomedical example, we could look for drugs and genes mentioned in the publications. This is called named-entity-recognition (NER), a technique applied in text-mining.

If a certain drug is often mentioned in the same publication as a particular gene, they "co-occur" and are likely related in some way. This would be an example of a relationship. The automated extraction of exactly how they are related is a difficult problem and is called relationship-extraction (RE).

Solutions for both NER and RE are usually domain-specific (ranging from simple matching of dictionary terms to AI models).

If you are interested in text-mining, a good place to start in python is NLTK.

The idea of a knowledge graph is to put defined things, called entities, in defined relationships to one another to create context. After you have a list of entities that you have found in all your documents, as well as their relationships (as in the example above, co-occurrance in a document or even a single sentence), you can define a schema and upload the entities and relationships into grakn and use all of its functionality to analyze your data.

For a tutorial on how to use grakn with already extracted data, see here

like image 3
hkuich Avatar answered Oct 19 '22 12:10

hkuich