I've found some advice for setting up tagging systems in relational and document databases, but nothing for graph/multi-model databases.
I am trying to set up a tagging system for documents (let's call them "articles") in ArangoDB. I can think of two obvious ways to store tags in a multi-model (graph+document) database like Arango:
Are these in fact the two main ways to do this? Neither seems ideal. For example:
Which leads me to an explicit question: with regard to the latter option, is there any simple way to automatically make connected 'tag' documents show up within the article documents? E.g. have an array property that somehow 'mirrored' the tag.name
properties of the connected tag documents?
General advice is also welcome.
Tags are stored in table "label" and once we add a tag to a ticket or anything, that gets stored in "label_entry" table.
In a graph database, there are no JOINs or lookups. Relationships are stored natively alongside the data elements (the nodes) in a much more flexible format. Everything about the system is optimized for traversing through data quickly; millions of connections per second, per core.
Graph databases store data like object-oriented languages. Each object can maintain a collection of other objects it is related to. These references are usually pointers to objects in-memory, and we do not have to store them explicitly. Nor do we have to find the object in memory with some foreign key attribute.
Additionally, they were considered to be “academic” databases, designed to build logical analysis systems, and not necessarily useful for business purposes. Though graph databases could provide useful results, in general they were complicated, time-consuming, and not terribly user-friendly.
You already mention most of the available decision criterias. Maybe I can add some more:
Relational tags inside the documents could use array indices to filter on them, which could make queries on them fast. However, if you like to add a rating or an explanation to each item of that tag array, there is no way to. If you want to count the documents tagged, this may also be more expensive than counting all edges that originate from a specific tag, or maybe find all tags matching a search criteria.
One of the powers of multi model is, that you don't need to decide between the both aproaches. You can have an edge collection connecting tags with attributes to your documents, and have an indexed array with the same (flat) tags inside of the document. If you find all (or most) of your queries just use one method, try to convert the rest and remove the other solution. If that doesn't work, your application simply needs both of them.
In both cases finding other tagged documents alongside could be done in a subequery:
LET docs=(FOR ftDoc IN FULLTEXT(articles, 'text', 'search')
COLLECT tags = ftDoc.tags INTO tags RETURN {tags, ftDoc})
LET tags = FLATTEN(FOR t IN docs[*].tags RETURN t)
LET otherArticles = (FOR oneTag IN tags
FOR oneD IN articles FILTER oneTag IN oneD.tag RETURN oneD._key)
RETURN {articles: docs, tags: tags, otherArticles: otherArticles}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With