Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to store graphs hbase? if so how do you model the database to support a graph structure?

I have been playing around with using graphs to analyze big data. Its been working great and really fun but I'm wondering what to do as the data gets bigger and bigger?

Let me know if there's any other solution but I thought of trying Hbase because it scales horizontally and I can get hadoop to run analytics on the graph(most of my code is already written in java), but I'm unsure how to structure a graph on a nosql database? I know each node can be an entry in the database but I'm not sure how to model edges and add properties to them(like name of nodes, attributes, pagerank, weights on edges,etc..).

Seeing how hbase/hadoop is modeled after big tables and map reduce I suspect there is a way to do this but not sure how. Any suggestions?

Also, does this make sense what I'm trying to do? or is it there better solutions for big data graphs?

like image 410
Lostsoul Avatar asked Mar 26 '12 01:03

Lostsoul


People also ask

Is HBase a graph database?

HGraphDB is a client layer for using HBase as a graph database. It is an implementation of the Apache TinkerPop 3 interfaces.

How do graph databases store data?

Most graph database systems store data in a structure similar to linked lists. They store direct links to data which is connected, rather than similar objects.

Can MongoDB store graph data?

MongoDB as a Graph Database. MongoDB offers graphing capabilities with its $graphLookup stage. Give $graphLookup a try by creating a free cluster in MongoDB Atlas. Graph databases fulfill a need that traditional databases have left unmet: They prioritize relationships between entities.

What are the different types of graph database?

There are two popular models of graph databases: property graphs and RDF graphs. The property graph focuses on analytics and querying, while the RDF graph emphasizes data integration. Both types of graphs consist of a collection of points (vertices) and the connections between those points (edges).


1 Answers

You can store an adjacency list in HBase/Accumulo in a column oriented fashion. I'm more familiar with Accumulo (HBase terminology might be slightly different) so you might use a schema similar to:

SrcNode(RowKey) EdgeType(CF):DestNode(CFQ) Edge/Node Properties(Value)

Where CF=ColumnFamily and CFQ=ColumnFamilyQualifier

You might also store node/vertex properties as separate rows using something like:

Node(RowKey) PropertyType(CF):PropertyValue(CFQ) PropertyValue(Value)

The PropertyValue could be either in the CFQ or the Value

From a graph processing perspective as mentioned by @Arnon Rotem-Gal-Oz you could look at Apache Giraph which is an implementation of Google Pregel. Pregel is the method Google use for large graph processing.

Using HBase/Accumulo as input to giraph has been submitted recently (7 Mar 2012) as a new feature request to Giraph: HBase/Accumulo Input and Output formats (GIRAPH-153)

like image 108
Binary Nerd Avatar answered Dec 08 '22 17:12

Binary Nerd