Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ideas for Doing Aggregates in a graph database

I know there must be a lot of people out there doing this.

Working on a project using neo4J. Let's say I have an entity called Photo. Now it's out on the internet and a million people like it. Putting those million likes into a graph and then navigating that graph to compute the aggregate so I can show the count seems stupid. Of course, indexes could make this more efficient, especially if the indexes are used in computing the aggregates (as they are in SQL), but much hunting around makes me think this is not the case. Of course, many of the aggregates are just relation counts at specific nodes, but this still seems wrong (for instance, having a graph relation going from the Photo to the Like event seems ugly).

Perhaps the best approach is to just use the graph db for what it's good for and then for things like events, put them in a SQL db. One counter argument might be that I could go to all that trouble then want an aggregate like 'how many friends of friends liked this?' and I am right back in the graph's backyard.

The choices that are out there seem to be either write some java or a bunch of cipher queries.

like image 209
Rob Avatar asked Mar 04 '26 10:03

Rob


1 Answers

Rob,

there are several options,

  • some people decided that it is best to keep the graph data in the graph and raw events in some other store and just derived the higher level concepts and constructs from the event stream and materialize those in the graph
  • secondary indices that store the aggregate data are similar but perhaps not as well integrated with the transactional graph
  • it is also possible to use in-graph structures to represent aggregated values or access patterns, René Pickard showed that with the graphity real-time tweet querying. the source of this is available in github

Oftentimes you have to look at your use-cases, is it more important to read all the likes or is only a small number of those likes really important, same goes for the count, if it is read often it makes sense to aggregate it (and keep it in sync) and read it from the aggregated place.

Due to the schema-less nature of the graph you can also evolve that - meaning if you have just a few likes it is faster and mores sensible to calculate that number on the fly by counting relationships, when your like count grows beyond a certain number you might migrate that into a variable on the image itself.

This might also be a time driven approach, e.g. shortly after a picture is posted lots of things happen around it so you'd rather like to keep the count up to date (remember it is not really important if that count differs by a few percent after all, so you can lazily update as well). After a while that picture does not get that much attention anymore and it is safe to just aggregate the like count into a property.

like image 143
Michael Hunger Avatar answered Mar 08 '26 20:03

Michael Hunger



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!