Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to design a database for translation dictionary?

I have database with words and phrases from for exp. English to 15 other languages, and also for every language in that list to other 15. For one pair they are sort for now in one table like this (en -> de):

  • id_pair
  • word_en
  • word_de

What is the best way to create database for that huge list of words and phrases? I know that I must separate every primary language from others, and was thinking maybe like this:

ENGLISH
ID | WORD
1  | 'dictionary'

GERMAN
ID | WORD
1  | 'lexikon'
2  | 'wörterbuch'

TRANSLATION_EN_DE
ID_EN | ID_DE
1     | 1
1     | 2

Is this the best way to normalize DB? But what is with phrases, I need also if someone enter word "dictionay" that this returns also "This dictionary is good" and translation for that. (I know this can find in first table with sql query, is that best way?)

Also need it alphabetically all time, I will have lot of new entry daily, so I can print couple words before and after the word/phases someone looking for translate.

I'm stuck and cant decide what is the best way to optimize it. These db have all together more than 15gb just text based translation, and around 100k daily req, so every ms worth. :) Any help will be appreciate, thx!

like image 346
Ivan Zg Avatar asked Jun 04 '13 09:06

Ivan Zg


People also ask

What are best practices for multi language database design?

For an application and its database to be truly multi-lingual, all texts should have a translation in each supported language – not just the text data in a particular table. This is achieved with a translation subschema where all data with textual content that can reach the user's eyes is stored.

What is multilingual database?

Building a database ready for internationalization means designing a database that can store multilingual data. In other words, the backend should be able to provide data in multiple languages. To do this, the backend should connect and retrieve this data from a multi-language database.


1 Answers

With separate table for each language, you'd need a large number of junction tables to cover all the possible translation combinations. On top of that, adding a new language would require adding more tables, rewriting the queries, client code etc.

It's better to do it in a more generalized way, similar to this:

enter image description here

Regarding the TRANSLATION table, I propose to also create a CHECK (WORD_ID1 < WORD_ID2) and create an index {WORD_ID2, WORD_ID1} (the opposite "direction" from the PK), and represent the both directions of the translation with only one row.

Consider clustering the TRANSLATION table if your DBMS supports that.

Also need it alphabetically all time

The query...

SELECT * FROM WORD WHERE LANGUAGE_ID = :lid ORDER BY WORD_TEXT

...can use the index underneath the UNIQUE constraint {LANGUAGE_ID, WORD_TEXT}.

like image 52
Branko Dimitrijevic Avatar answered Sep 25 '22 04:09

Branko Dimitrijevic