Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graph Database - How to deal with multilingual data

I'm trying to approach a multilingual graph database but I'm struggling on how to achieve an optimal model.

My current proposal is to make two node types: Movie and MovieTranslation.

Movie holds all relationships as likes, related, ratings and comments. MovieTranslation contains all translatable data (title, plot, genres). A Movie node does not contain these kind of properties, only the original_title.

Movie and MovieTranslation are tied together by a translation relationship.

When I query nodes, I would check if they have a translation relationship with the queried locale (en_US for example). If true, merge the translation with the main node as the result.

I think this way might not be the best, but I can't think on a better one.

Do you guys have a better suggestion for the database model? It would be very appreciated.

I'm using neo4j, if you need this information.

Thanks, Vinicius.

like image 959
Vinicius Tavares Avatar asked Aug 23 '15 01:08

Vinicius Tavares


People also ask

Why is graph database not popular?

A graph database is just a data store and doesn't give you a business-facing user interface to query or manage relationships. Also, it will not provide advanced match and survivorship functionality or data quality capabilities. Graph databases do not create better relationships.

What are the problems solved by graph database?

Types of Problems that Graphs Solve A graph is able to blend various datasets into a structure that enables the ability to reveal connections. Fraud Detection: Business events and customer data, such as new accounts, loan applications and credit card transactions can be modelled in a graph in order to detect fraud.

Which query language is used to in graph based data?

Most relational databases use a dialect of SQL as their query language, and while the graph database world has a few query languages to choose from, a growing number of vendors and technologies have adopted Cypher as their graph database query language (including Neo4j).


2 Answers

I suggest moving the original title to its own node also, call it MovieTitle. "Complicating" your model in this way should actually "simplify" (or at least standardise) your queries because you're always looking in one place for film titles (also for indexing and searching).

You're assuming that films only have one original title which isn't the case. A Korea-Japan co-production will have at least two original titles. Whole genres of Japanese cinema were released with different original Japanese titles in cinemas and on VHS.

Distinct from the idea of an original title is that of specific language titles. The same film released in different Chinese-speaking countries will have different Chinese-language titles that are deemed more marketable to the specific local audiences.

To get the original title:
MATCH (c:Country)<-[HAS_NATIONALITY]-(m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country) WHERE m.id = 1 RETURN COLLECT(t.title, c.country_code)

To get the original title in China:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country) WHERE c.country_code == "CN" RETURN m, COLLECT(t.title, c.country_code)

To get all language titles:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country)-[HAS_LANGUAGE]->(l:Language) RETURN m, COLLECT(t.title, l.language_code)

To get all Chinese-language titles:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country)-[HAS_LANGUAGE]->(l:Language) WHERE l.language_code == "zh" RETURN m, COLLECT(t.title, c.name)

I would separate plot and genre into their own nodes. There is an argument that different national cinemas have unique genres, but if westerns and samurai dramas are both sub-genres of period dramas then you want to find them both on a period drama search.

I would still have the idea of Translation nodes but don't confuse with them the domain you're modelling. It should be domain-ignorant and - for simple words/phrases like "romantic comedy" - should almost be a third-party graph plug-in released by GraphAware in 2025.

Get the French-language genre titles of a specific film:
MATCH (m:Movie)-[HAS_GENRE*]->(g:Genre)-[HAS_TRANSLATION]->(t:Translation)-[HAS_LANGUAGE]->(l:Language) WHERE m.id = 100 AND l.language_code = "fr" RETURN COLLECT(t.translation)

Get all romanic comedies:
MATCH (m:Movie)-[HAS_GENRE*]->(g:Genre)-[HAS_TRANSLATION]->(t:Translation) WHERE t.translation = "comédie romantique" RETURN m

Unlike movie titles and genres, plots are altogether more simple because you're modelling the film's story as a blob of text and not as domain objects in itself. Perhaps later you may do textual analysis on the plot texts to find themes, gender bias, etc, and model this in the graph as well.

Get the French language plot for a specific movie:
MATCH (m:Movie)-[HAS_PLOT]->(p:Plot)-[HAS_LANGUAGE]->(l:Language)-[HAS_TRANSLATION]->(t:Translation) WHERE m.id = 100 AND t.translation = "French" RETURN p.plot

(Please treat the Cypher queries as pseudo-code. I didn't make a graph and test them.)

like image 96
Stephen Cremin Avatar answered Nov 15 '22 12:11

Stephen Cremin


I think the model is ok.

You can RETURN movie, translation or RETURN {movie:movie, translation:translation}

Currently converting nodes to maps and combining these maps is not yet supported, that's something on the roadmap.

How and where would you want to use the nodes? If for rendering, you can just access the two columns or entries. If for graph visualization you can also combine them into a node in the json source for the viz.

like image 29
Michael Hunger Avatar answered Nov 15 '22 11:11

Michael Hunger