Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Localization for REST APIs

I am starting this discussion to gather more info on localization practices for APIs. It seems HTTP does NOT provide sufficient guidance and even the state of practice is not sufficient enough.

The basic problem is that APIs may need to provide content that is dependent on the user culture, country, language and timezone. For example a German user would like to read messages in German language, with European metric dates, numbers, units, using Euro currency and in Central European Timezone.

Reading through RFC 7231 Section 5.3.5 Accept-Language and further into RFC 4647 one may think Accept-Language is sophisticated enough and is what should be done. There are several notable shortcomings though:

  1. Language tags may not be precise enough e.g. user may only request language without country code and thus leave ambiguity as: "de, en;q=0.8"
  2. Even if the user supplies both language and country preferences it is not clear how to tie the selection of message locale and value formatting locale. For example if a user requests: "hu_HU, en_US;q=0.9" while the application lacks Hungarian messages and is written in Java that knows how to format date in Hungarian. So should the app use English messages with Hungarian dates or rather provide English messages with US dates? The actual situation may be more complex.
  3. Timezone is not present in the language tags. There is no HTTP standard header for this it seems.

I see Microsoft have thought about #2 in ASP.Net and introduce the notion of Culture and UICulture to separate selection of message language from formatting.

In Java world Spring have introduced TimeZoneAwareLocaleContext to address #3

W3c have issued guideline to Accept-Language used for locale setting. This more or less says that Accept-Language is not enough

So what is your thinking?

  1. Do you know of APIs tat solve this problem in comprehensive way? Pointers?
  2. Should APIs accept multiple values for selecting message language, value formatting locale and timezone?
  3. Should Accept-Language be used at all?
like image 660
Kiril Avatar asked Dec 27 '18 13:12

Kiril


People also ask

What is a locale localization?

Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale). Localization is sometimes written in English as l10n, where 10 is the number of letters in the English word between l and n.


2 Answers

Ok guys,

here is a summary of how I answer my question. I hope this helps future API authors.

The fundamental requirements for an UI based on top of API excluding currency presentation seem to be:

  1. Select the best language out of the available product translations using RFC 4647 list of language ranges
  2. Select the best data format out of the available using RFC 4647 list of language ranges
  3. Allow clients to provide distinct preferences for translation and format. There will be cases where people will not find the best translation and yet prefer to see the proper formatting aligned with their culture.
  4. Allow clients to specify a timezone using IANA TZDB identifiers
  5. Format data elements using Unicode CLDR http://cldr.unicode.org/
  6. Use named placeholders in localization bundles e.g. "{drive} is corrupt" is easier to translate properly than "{1} is corrupt"

On the REST HTTP headers I suggest use of 3 headers

  1. accept-language - used for selecting translation and following the guidelines of RFC 7231 https://www.rfc-editor.org/rfc/rfc7231#section-5.3.5
  2. format-locale - used to select data formatting style if different from the translation language preferences. Again list of language range elements. Defaults to accept-language if omitted.
  3. timezone - used to select timezone for rendering date and time values. This should be valid timezone ID from the IANA TZDB https://www.iana.org/time-zones

Implementation wise it seems Java 8 and later have full capability to implement a globalized application. Other languages and older Java versions seem to have varying degrees of issues.

like image 128
Kiril Avatar answered Oct 02 '22 04:10

Kiril


I would keep all data in a universal locale independent format. For numbers using . as a decimal separator, date and time using ISO 8601 and in UTC, etc.

Provide localized text only if it absolutely necessary. In that case get the locale from accept-language header field, and if you have the localized string pass that. If not fallback to the string you have.

For example, you might a multilingual product database that contains product data in several languages. When you write an API for the database you can select the product data in user's language (if any).

Here is a sample.

like image 35
Jaska Avatar answered Oct 02 '22 04:10

Jaska