Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adjustable, versioned graph database

I'm currently working on a project where I use natural language processing to extract emotions from text to correlate them with contextual information.

Definition of contextual information: Every information that is relevant to describe an entity's situation in time an space.

Description of the data structure I'm looking for:

There is a arbitrary number of entities (an entity can either be a person or a group for example (twitter hash tags)) of which I want to track contextual information and their conversations with other entities. Conversations between entities are processed in order to classify their emotional features. Basic emotional features consist of a vector that specifies their occurrence percentually: {fear: 0.1, happiness: 0.4, joy: 0.1, surprise: 0.9, anger: 0} Entities can also submit any contextual information they'd like to share, for example: location, room-temperature, blood pressure, ... and so on (will refer to this as contextual variables). Because neither the number of conversations of an entity, nor the number of contextual variables they want to share is clear at any point in time, the data structure needs to be able to adjust accordingly.

Important: Every change in the data must also represent an own state as I'm looking forward to correlate certain changes in state with each other.

Example: Bob and Alice have a conversation that shows high magnitude of fear. A couple of hours later they have another conversation that shows no more fear, but happiness. Now, one could argue that high magnitude fear, followed by happiness actually could be interpreted as the emotion relief.

However, in order to be able to extract this very information I need to be able to correlate different states with each other. Same goes for using contextual information to correlate them with the tracked emotions in conversations. This is why every state change must be recorded and available.

To make this more clear to you, I've created a graphic and attached it to the question.

enter image description here Now, the actual question I have is: Which database/data structure can I use to solve this problem? I've looked into event-sourcing databases but wasn't quite convinced if I can easily recreate a graph structure with them. I also looked at graph databases but didn't find what I was looking for.

Therefore it would be nice if someone here could at least point me in the right direction or help me adjust my structure accordingly to solve the problem. If however there are data structures supporting, what I call it graph databases with snapshots then ease of usage is probably the most important feature to filter for.

like image 688
Tim Daubenschütz Avatar asked Feb 19 '15 12:02

Tim Daubenschütz


2 Answers

There's a database called Datomic by Rich Hickey (of Clojure fame) that stores facts over time. Every entry in the database is a fact with a timestamp, append-only as in Event Sourcing.

These facts can be queried with a relational/logical language ala Datalog (remiscent of Prolog). Please see This post by kisai for a quick overview. It has been used for querying graphs with some success in the past: Using Datomic as a Graph Database.

While I have no experience with Datomic, it does seem to be quite suitable for your particular problem.

like image 89
Joakim Ahnfelt-Rønne Avatar answered Oct 30 '22 22:10

Joakim Ahnfelt-Rønne


You have an interesting project, I do not work on things like this directly but for my 2 cents -

It seems to me your picture is a bit flawed. You are trying to represent a graph database overtime but there isn't really a way to represent time this way. If we examine the image, you have conversations and context data changing over time, but the fact of "Bob" and "Alice" and "Malory" actually doesn't change over time. So lets remove them from the equation.

Instead focus on the things you can model over time, a conversation, a context, a location. These things will change as new data comes in. These objects are an excellent candidate for an event sourced model. In your app, the conversation would be modeled as a series of individual events which your aggregate would use and combine and factor to generate a final state which would be your 'relief' determination.

For example you could write logic where if a conversation was angry then a very happy event came in then the subject is now feeling relief.

What I would do is model these conversation states in your graph db connected to your 'Fact' objects "Bob", "Alice", etc. And a query such as 'What is alice feeling right now?' would be a graph traversal through your conversation states factoring in the context data connected to alice.

To answer a question such as 'What was alice feeling 5 minutes ago?' you would take all the event streams for the conversations and rewind them to the appropriate point then examine the state of the conversations.

TLDR: Separate the time dependent variables from the time independent variables and use event sourcing to model time.

like image 31
Charles Avatar answered Oct 30 '22 21:10

Charles