Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Translating to Languages with Irregular Rules

In order to make a PHP content management system extensible, language translations are crucial. I was researching programming approaches for a translations system, and I thought that Qt Linguist was a good example.

This is an example usage from the Qt documentation:

int n = messages.count();
showMessage(tr("%n message(s) saved", "", n));

Qt uses known language rules to determine whether "message" has an "s" appended in English.

When I brought that example up with my development team, they discovered an issue that jeopardizes the extensibility effectiveness of modeling off of Qt's tr() function.

This is a similar example, except that something is now seriously wrong.

int n = deadBacteria.count();
showMessage(tr("%n bacterium(s) killed", "", n));

The plural of "bacterium" is "bacteria". It is improper to append an "s".

I don't have much experience with Qt Linguist, but I haven't seen how it handles irregular conjugations and forms.

A more complicated phrase could be "%n cactus(s) have grown.". The plural should be "cactii", and "have" needs to be conjugated to "has" if there is one cactus.

You might think that the logical correction is to avoid these irregular words because they are not used in programming. Well, this is not helpful in two ways:

  1. Perhaps there is a language that modifies nouns in an irregular way, even though the source string works in English, like "%n message(s) saved". In MyImaginaryLanguage, the proper way to form the translated string could be "1Message saved", "M2essage saved", "Me3ssage saved" for %n values 1, 2, and 3, respectively, and it doesn't look like Qt Linguist has rules to handle this.
  2. To make a CMS extensible like I need mine to be, all types of web applications need to be factored in. Somebody may build a role-playing game that requires sentences to be constructed like "5 cacti have grown." Or maybe a security software wants to say, "ClamAV found 2 viruses." as opposed to "ClamAV found 2 virus(es)."

After searching online to see if other Qt developers have a solution to this problem and not finding any, I came to Stack Overflow.

I want to know:

  1. What extensible and effective programming technique should be used to translate strings with possible irregular rules?
  2. What do Qt programmers and translators do if they encounter this irregularity issue?
like image 718
Deltik Avatar asked Mar 20 '12 00:03

Deltik


People also ask

What are the irregular verb rules?

What exactly is an irregular verb? The short answer is that a verb is irregular if you can't change it to past tense just by adding “-ed” or “-d” to the end. In order to use these verbs correctly, you have to memorize their past simple and past participle forms, since they don't fit into the usual pattern.

Why is Dutch Komen irregular?

One verb that does not follow the spelling rule is komen. The singular forms are all written and pronounced with the short o, while the plural forms are written and pronounced with the long o: kom, komt and komen. (According to the spelling rules, the singular forms should be the long o, but they are not.)

What is the rule for irregular verbs in Spanish?

The simplest irregular verbs in Spanish are the so-called stem-changing verbs. They're easy to learn. The “stem” of a verb is the part you get when you remove infinitive suffix (that is, the -ar, -er, or -ir) from the infinitive form. So the stems of hablar, deber, and vivir are habl-, deb-, and viv-.

What is the main principle of the morphological differentiation of verbs into regular and irregular classes?

Recognition of regulars entails decomposition and activation of a shared stem in present and past tense forms, whereas recognition of irregulars is based on non-combinatorial association between past and present forms.


1 Answers

You've misunderstood how the pluralisation in Qt works: it's not an automatic translation.

Basically you have a default string e.g. "%n cactus(s) have grown." which is a literal, in your code. You can put whatever the hell you like in it e.g. "dingbat wibble foo %n bar".

You may then define translation languages (including one for the same language you've written the source string in).

Linguist is programmed with the various rules for how languages treat quantities of something. In English it's just singular or plural; but if a language has a specific form for zero or whatever, it presents those in Linguist. It then allows you to put in whatever the correct sentence would be, in the target translation language, and it handles putting the %n in where you decide it should be in the translated form.

So whoever does the translation in Linguist would be provided the source, and has to fill in the singular and plural, for example.

Source text: %n cactus(s) have grown.

English translation (Singular): %n cactus has grown.

English translation (Plural): %n cacti have grown.

If the application can't find an installed translation then it falls back to the source literal. Also the source literal is what the person translating sees so has to infer what you meant from it. Hence "dingbat wibble foo %n bar" might not be a good idea when describing how many cacti have grown.

Further reading:

  • The Linguist manual
  • The Qt Quarterly article on Plural Form(s) in Translation(s)
  • The Internationalization example or the I18N example
  • Download the SDK and have a play.
like image 64
Samuel Harmer Avatar answered Sep 30 '22 16:09

Samuel Harmer