I'm trying to approach a multilingual graph database but I'm struggling on how to achieve an optimal model. My current proposal is to make two node types: <code>Movie</code> and <code>MovieTranslation</code>. <code>Movie</code> holds all relationships as <code>likes</code>, <code>related</code>, <code>ratings</code> and <code>comments</code>. <code>MovieTranslation</code> contains all translatable data (<code>title</code>, <code>plot</code>, <code>genres</code>). A <code>Movie</code> node does not contain these kind of properties, only the <code>original_title</code>. <code>Movie</code> and <code>MovieTranslation</code> are tied together by a <code>translation</code> relationship. When I query nodes, I would check if they have a <code>translation</code> relationship with the queried locale (en_US for example). If true, merge the translation with the main node as the result. I think this way might not be the best, but I can't think on a better one. Do you guys have a better suggestion for the database model? It would be very appreciated. I'm using neo4j, if you need this information. Thanks, Vinicius.

I think the model is ok. You can <code>RETURN movie, translation</code> or <code>RETURN {movie:movie, translation:translation}</code> Currently converting nodes to maps and combining these maps is not yet supported, that's something on the roadmap. How and where would you want to use the nodes? If for rendering, you can just access the two columns or entries. If for graph visualization you can also combine them into a node in the json source for the viz.

Graph Database - How to deal with multilingual data

Tags:

graph

graph-databases

neo4j

I'm trying to approach a multilingual graph database but I'm struggling on how to achieve an optimal model.

My current proposal is to make two node types: Movie and MovieTranslation.

Movie holds all relationships as likes, related, ratings and comments. MovieTranslation contains all translatable data (title, plot, genres). A Movie node does not contain these kind of properties, only the original_title.

Movie and MovieTranslation are tied together by a translation relationship.

When I query nodes, I would check if they have a translation relationship with the queried locale (en_US for example). If true, merge the translation with the main node as the result.

I think this way might not be the best, but I can't think on a better one.

Do you guys have a better suggestion for the database model? It would be very appreciated.

I'm using neo4j, if you need this information.

Thanks, Vinicius.

959

asked Aug 23 '15 01:08

Vinicius Tavares

2 Answers

I suggest moving the original title to its own node also, call it MovieTitle. "Complicating" your model in this way should actually "simplify" (or at least standardise) your queries because you're always looking in one place for film titles (also for indexing and searching).

You're assuming that films only have one original title which isn't the case. A Korea-Japan co-production will have at least two original titles. Whole genres of Japanese cinema were released with different original Japanese titles in cinemas and on VHS.

Distinct from the idea of an original title is that of specific language titles. The same film released in different Chinese-speaking countries will have different Chinese-language titles that are deemed more marketable to the specific local audiences.

To get the original title:
MATCH (c:Country)<-[HAS_NATIONALITY]-(m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country) WHERE m.id = 1 RETURN COLLECT(t.title, c.country_code)

To get the original title in China:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country) WHERE c.country_code == "CN" RETURN m, COLLECT(t.title, c.country_code)

To get all language titles:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country)-[HAS_LANGUAGE]->(l:Language) RETURN m, COLLECT(t.title, l.language_code)

To get all Chinese-language titles:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country)-[HAS_LANGUAGE]->(l:Language) WHERE l.language_code == "zh" RETURN m, COLLECT(t.title, c.name)

I would separate plot and genre into their own nodes. There is an argument that different national cinemas have unique genres, but if westerns and samurai dramas are both sub-genres of period dramas then you want to find them both on a period drama search.

I would still have the idea of Translation nodes but don't confuse with them the domain you're modelling. It should be domain-ignorant and - for simple words/phrases like "romantic comedy" - should almost be a third-party graph plug-in released by GraphAware in 2025.

Get the French-language genre titles of a specific film:
MATCH (m:Movie)-[HAS_GENRE*]->(g:Genre)-[HAS_TRANSLATION]->(t:Translation)-[HAS_LANGUAGE]->(l:Language) WHERE m.id = 100 AND l.language_code = "fr" RETURN COLLECT(t.translation)

Get all romanic comedies:
MATCH (m:Movie)-[HAS_GENRE*]->(g:Genre)-[HAS_TRANSLATION]->(t:Translation) WHERE t.translation = "comédie romantique" RETURN m

Unlike movie titles and genres, plots are altogether more simple because you're modelling the film's story as a blob of text and not as domain objects in itself. Perhaps later you may do textual analysis on the plot texts to find themes, gender bias, etc, and model this in the graph as well.

Get the French language plot for a specific movie:
MATCH (m:Movie)-[HAS_PLOT]->(p:Plot)-[HAS_LANGUAGE]->(l:Language)-[HAS_TRANSLATION]->(t:Translation) WHERE m.id = 100 AND t.translation = "French" RETURN p.plot

(Please treat the Cypher queries as pseudo-code. I didn't make a graph and test them.)

answered Nov 15 '22 12:11

Stephen Cremin

I think the model is ok.

You can RETURN movie, translation or RETURN {movie:movie, translation:translation}

Currently converting nodes to maps and combining these maps is not yet supported, that's something on the roadmap.

How and where would you want to use the nodes? If for rendering, you can just access the two columns or entries. If for graph visualization you can also combine them into a node in the json source for the viz.

answered Nov 15 '22 11:11

Michael Hunger

Related questions
                            
                                Prims Algorithm Total Running time!
                            
                                Redis: Implement Weighted Directed Graph
                            
                                Adding Points, Legends and Text to plots using xts objects
                            
                                Implementing Dijkstra's algorithm using min-heap but failed
                            
                                Can I use python with giraph?
                            
                                Export HighChart as an image in excel file together with the other page contents
                            
                                Color map in boost graph breadth_first_visit
                            
                                Visitor Pattern and traversal mechanism
                            
                                Relationship between BFS and topological sort
                            
                                How to get the coordinates of a graph drawn by specific layout algorithm in graph-tool?
                            
                                Embedding Graph in Euclidean Space
                            
                                Which boost graph algorithm do I use?
                            
                                How to split a DOT file with multiple graphs into multiple DOT files using GVPR?
                            
                                Python networkx : edge contraction
                            
                                Performing DFS and BFS on a directed graph
                            
                                Algorithm to cover all edges given starting node
                            
                                DFS on undirected graph complexity
                            
                                Shortest Path Algorithm with Fuel Constraint and Variable Refueling
                            
                                Shortest path after doubling edge weights
                            
                                Given a set of words, how can you identify "n" set of letters that will help you make the maximum number of complete words from the original list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With