How to build a knowledge graph?

Tags:

I prototyped a tiny search engine with PageRank that worked on my computer. I am interested in building a Knowledge Graph on top of it, and it should return only queried webpages that are within the right context, similarly to how Google found relevant answers to search questions. I saw a lot of publicity around Knowledge Graphs, but not a lot of literature and almost no pseudocode like guideline of building one. Does anyone know good references on how such Knowledge Graphs work internally, so that there will be no need to create models about a KG?

843

asked Apr 05 '15 19:04

Pippi

2 Answers

Knowledge graph is a buzzword. It is a sum of models and technologies put together to achieve a result. The first stop on your journey starts with Natural language processing, Ontologies and Text mining. It is a wide field of artificial intelligence, go here for a research survey on the field.

Before building your own models, I suggest you try different standard algorithms using dedicated toolboxes such as gensim. You will learn about tf-idf, LDA, document feature vectors, etc.

I am assuming you want to work with text data, if you want to do image search using other images it is different. Same for the audio part.

Building models is only the first step, the most difficult part of Google's knowledge graph is to actually scale to billions of requests each day ...

A good processing pipeline can be built "easily" on top of Apache Spark, "the current-gen Hadoop". It provides a resilient distributed datastore which is mandatory if you want to scale.

If you want to keep your data as a graph, as in graph theory (like pagerank), for live querying, I suggest you use Bulbs which is a framework which is "Like an ORM for graphs, but instead of SQL, you use the graph-traversal language Gremlin to query the database". You can switch the backend from Neo4j to OpenRDF (useful if you do ontologies) for instance.

For graph analytics you can use Spark, GraphX module or GraphLab.

Hope it helps.

105

answered Oct 05 '22 06:10

Kirell

I know I'm really late but first to clarify some terminology: Knowledge Graph and Ontology are similar (I'm talking in the Semantic Web paradigm). In the semantic web stack the foundation is RDF which is a language for defining graphs as triples (Subject, Predicate, Object). RDFS is a layer on top of RDF. It defines a meta-model, e.g., predicates such as rdf:type and nodes such as rdfs:Class. Although RDFS provides a meta-model there is no logical foundation for it so there are no reasoners that can validate the model or do further reasoning on it. The layer on top of RDFS is OWL (Web Ontology Language). That has a formal semantics defined by Description Logic which is a decidable subset of First Order Logic. It has more predefined nodes and links such as owl:Class, owl:ObjectProperty, etc. So when people use the term ontology they typically mean an OWL model. When they use the term Knowledge Graph it may refer to an ontology defined in OWL (because OWL is still ultimately an RDF graph) or it may mean just a graph in RDF/RDFS.

I said that because IMO the best way to build a knowledge graph is to define an ontology and then use various semantic web tools to load data (e.g., from spreadsheets) into the ontology. The best tool to start with IMO is the Protege ontology editor from Stanford. It's free and for a free open source tool very reliable and intuitive. And there is a good tutorial for how to use Protege and learn OWL as well as other Semantic Web tools such as SPARQL and SHACL. That tutorial can be found here: New Protege Pizza Tutorial (disclosure: that links to my site, I wrote the tutorial). If you want to get into the lower levels of the graph you probably want to check out a triplestore. It is a graph database designed for OWL and RDF models. The free version of Franz Inc's AllegroGraph triplestore is easy to use and supports 5M triples. Another good triplestore that is free and open source is part of the Apache Jena framework.

answered Oct 05 '22 08:10

Michael DeBellis

Related questions
                            
                                What's the way to determine if an Int is a perfect square in Haskell?
                            
                                How to convert a recursive function to use a stack?
                            
                                Set time and speed complexity
                            
                                check 1 billion cell-phone numbers for duplicates
                            
                                Minimal instruction set to solve any problem with a computer program
                            
                                Dijkstra's algorithm with negative edges on a directed graph
                            
                                Writing a simple equation parser
                            
                                Java indexOf function more efficient than Rabin-Karp? Search Efficiency of Text
                            
                                O(n^2) vs O (n(logn)^2)
                            
                                Display 1,2,3,4,5,6,8,10,11 as 1-6,8,10-11
                            
                                Complete search algorithm for combinations of coins
                            
                                What kind of problems are state machines good for? [closed]
                            
                                Gauss-Legendre Algorithm in python
                            
                                Efficient way to count number of swaps to insertion sort an array of integers in increasing order
                            
                                Fast solution to Subset sum
                            
                                Soundness and Completeness of a algorithm
                            
                                Calculate bounding polygon of alpha shape from the Delaunay triangulation
                            
                                How to find k nearest neighbors to the median of n distinct numbers in O(n) time?
                            
                                algorithm to check a connect four field
                            
                                Best algorithm to find the minimum absolute difference between two numbers in an array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to build a knowledge graph?

Tags:

algorithm

search

graph

artificial-intelligence

Pippi

People also ask

2 Answers

Kirell

Michael DeBellis

Recent Activity

Donate For Us