Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to support Multi-Languages approach in DataBase Schema?

I want my database to support multi Languages for all text values in its tables.

So what is the best approach to do that?.

Edit1::

E.G.

I've this "Person" table:

ID int
FirstName  nvarchar(20)
LastName   nvarchar(20)
Notes      nvarchar(max)
BirthDate  date
...........

So if i want my program to support new language "let say French".

should i add new column every time i add new language ? So my "Person" table will look like this

ID int
FirstName_en  nvarchar(20)
FirstName_fr  nvarchar(20)
LastName_en   nvarchar(20)
LastName_fr   nvarchar(20)
Notes_en      nvarchar(max)
Notes_fr      nvarchar(max)
BirthDate     date
...........

Or should i add 2 new tables one for languages and other for "Person_Languages" values ?

So this will look like : " Languages " table:

ID           int
Lang-symbol  nvarchar(4)

" Person " Table:

ID         int
BirthDate  Date

and finally " Person_Translation " table:

LangID        int
PersonID      int
Translation   nvarchar(max)

Or there is something better ??

.

like image 877
Wahid Bitar Avatar asked Dec 30 '09 13:12

Wahid Bitar


2 Answers

I have had to deal with this in a questionaire database. Multiple questionaires needed to be translated in multiple languages (English, Japanese, Chinese).

We first identified all text columns that would be printed out on the questionaires. For all these we would need to be able to store a translation. For each table having text columns that would require translations, we then created a _translations table, having a foreign key to point to the primary key of the original table, a foreign key to our language table, and then a unicode column for each text field that would require translation. In these text columns we would store the translations for each language we needed.

So a typical query would look like:

select     p.id
,          pt.product_name
,          pt.product_description
from       product                  p
inner join product_translations pt
on         p.id = pt.product_id
and        'fr' = pt.language_code

So, always just one join extra (for each table) to obtain the translations.

I should point out that we only had tto deal with a limited amount of tables, so it was not a big issue to maintain a few extra %_translations tables.

We did consider adding columns for the new language, but decided against it for a coouple of reasons. First of all the number of languages to be supported was not known, but could be substantial (10, 20 languages or maybe more). Combined with the fact that most tables had at least 3 distinct human readable columns, we would have to add many, many text columns which would result in very wide rows. So we decided not to do that.

Another approach we considered as to make one big "label" table, having the columns:

( table_name , id_of_table , column_name , language_id , translated_text)

effectively having one table to store all translations anywhere in the database. We decided against that too, because it would complicate writing queries (as each 'normal' column would result in one row in the translation table, which would result in effectively joining the already large translation table multiuple times to the normal table (once for each translated column). For your example table you would get queries like this:

select     product.id 
,          product_name.translated_text product_name
,          product_description.translated_text product_description
from       product p
inner join translations product_name
on         p.id = product_name.id
and        'product'      = product_name.table_name
and        'product_name' = product_name.column_name
and        'fr'           = product_name.language
inner join translations product_description
on         p.id = product_name.id
and        'product'      = product_description.table_name
and        'product_description' = product_description.column_name
and        'fr'           = product_description.language

as you can see, essentially this kind of like an entity-attribute-value design, which makes it cumbersome to query.

Another problem of that last approach is that it would make it hard if not impossible to enforce constraints on translated text (in our case mainly unicity constraints). With a separatee table for the translations, you can easily and cleanly overcome those problems.

like image 77
Roland Bouman Avatar answered Nov 15 '22 20:11

Roland Bouman


I've just implemented something which is working very well.

Like many others, I've not had a clean slate to start with, and all tables were geared up to store English text.

I was not particularly happy to double the number of tables, or columns to achieve a multilingual database.

I decided to utilise XML as a way to store all translations, into a single field:

<text>
  <translation lang="EN-GB">good day</translation>
  <translation lang="FR-FR">bonjour</translation>
</text>

So for example, I started with a table which only holds English:

ProductId INT
ProductName varchar(200)
ProductDescription varchar(1000)

I then created the multilingual fields:

ProductId INT
ProductNameTranslation xml
ProductDescriptionTranslation xml

If you want to be able to get a read-only value for the original fields, you can simply add a persisted computed columns:

ProductId INT
ProductNameTranslation xml
ProductDescriptionTranslation xml
ProductNameDefaultLang = dbo.GetDefaultLanguage(ProductNameTranslation)
ProductDescriptionDefaultLang = dbo.GetDefaultLanguage(ProductDescriptionTranslation)

In the business/data layer, the XML field is converted in to a dictionary type business class, with key being a language enumeration:

ProductName {
  get {return this.ProductNameTranslation[selectedLanguage];}
  set {...}
}

This approach allowed me to avoid having to completely re-carve up the database. It also means less interaction with the database, one row to be saved instead of one main row, and 3 translation rows.

Also it avoids the issue of having to handle cases where the main row has no translation rows

Should be noted that the XML had a schema and XML indexes to boost performance.

This approach is very useful where you want to do the language conversion incrementally i.e. one field at a time.

like image 36
Sam Avatar answered Nov 15 '22 21:11

Sam