I want to make a multi-language site, such that all or almost all pages will be available in 2 or more translations. What are the best practices to follow?
For example, I consider these language selection mechanisms:
Accept-Language
header if the cookie is not set.Is there anything else?
How should different translations be served?
LANG.example.com/page
example.com/LANG/page
example.com/page?hl=LANG
example.com/page
? (It seems to be discouraged)How to ensure that all the translations are properly indexed?
Content-Language
header are enough?What is the best way to let the users know there are other translations, but do not distract them?
What is the best policy to deal with missing/outdated translations?
What else should I take into account? What should I do and what I definitely should not?
In addition to @Quassnoi's answers ensure that you standard RFC 4646 language identifiers (e.g. EN-US, DE-AT); you may already be aware of this. The CLDR project is an excellent repository of internationalization data (the Supplemental Data is really useful).
If a translation of a specific page is not available, use a language fallback mechanism back to the neutral language; for example "DE-AT", "DE", "" (neutral, e.g. "EN").
Most recent browsers and the underlying operating systems will correctly show all of the characters required for a locale selector list if the page is encoded correctly (I'd recommend all pages being UTF-8). Ensure that the locale list contains both the native and current-language names to allow both native and non-native speakers to view the specified translations, e.g. "Deutsch (German)" if the current locale is EN-*
.
A lot of sites use a flag icon to show the current locale, but this is more relevant to the location and some people may be offended if you show only a dominant flag (e.g. the US or UK flag for English).
It may be worthwhile to have a more visible (semi-graphical) locale selector on the home page if no locale cookie has been submitted, using a combination of GeoIP and Accept-Language to determine the default locale choice.
Semi-related: if your users are in located in different time zones include a zone preference in their account profile for displaying time values in their local time. And store all time stamps using UTC.
Make the decision whether you need support for languages that require double byte characters early on (Chinese, Japanese, Korean, etc), Unicode is the preferable choice. It can be tedious to change later, especially if you have a database that doesn't use unicode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With