Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Allowing Simplified Chinese Input

The company I work for is bidding on a project that will require our eCommerce solution to accept simplified Chinese input. After doing a bit of research, it seems that ASP.net makes globalization configuration easy:

<configuration>
  <system.web>
    <globalization
      fileEncoding="utf-8"
      requestEncoding="utf-8"
      responseEncoding="utf-8"
      culture="zh-Hans"
      uiCulture="en-us" />
  </system.web>
</configuration>

Questions:

  1. Is this really all there is to it in ASP.net? It seems to good to be true.
  2. Are there any DB considerations with SQL Server 2005? Will the DB accept the simplified Chinese without additional configuration?
like image 699
James Hill Avatar asked Mar 23 '12 18:03

James Hill


People also ask

How do you add simplified Chinese?

1- Windows 8 (and higher)Go to “Settings” > “Change PC Settings” > “Time & Language” > “Region & Language.” Click on “Add a Language” and select “Chinese (Simplified, China).” This will add it to your list of languages.

Why do people use simplified Chinese?

Simplified Chinese was introduced by the Chinese government in 1956 in an effort to promote literacy. The new script's characters were simplified by reducing the number of strokes. Generally speaking, it's much easier for someone who reads Traditional Chinese to read Simplified Chinese than other way around.

Which Chinese input method is best?

The most popular Chinese character typing method in the US and Mainland China is Pinyin input. Pinyin input simply involves typing on a standard QWERTY keyboard and spelling out the desired word in Pinyin, and then selecting the correct characters from a pop-up selection on the screen.


2 Answers

Ad 1. The real question is, how far you want to go with Internationalization. Because i18n is not only allowing Unicode input. You need at least support local date, time and number formats, local collation (mostly related to sorting) and ensure that your application runs correctly on localized Operating Systems (unless you are developing Cloud aka hosted solution). You might want to read more on the topic here.

As far as support for Chinese character input goes, if you are going to offer software in China, you need to at least support GB18030-2000. To do just that, you need to use proper .Net Framework version - the one that supports Unicode 3.0. I believe it was supported since .Net Framework 2.0.
However, if you want to go one step further (which might be required for gaining competitive edge), you might want to support GB18030-2005. The only problem is, the full support for these characters (CJK Unified Ideographs Extension B) happened later (I am not really sure if it is Unicode 6.0 or Unicode 6.1) in the process. Therefore you might be forced to use the latest .Net Framework and still not be sure if it covers everything.
You might want to read Unicode FAQ on Han characters.

Ad 2. I strongly advice you not to use SQL Server 2005 with Chinese characters. The reason is, old SQL Server engine supports only UCS-2 rather than UTF-16. This might seems as slight difference, but that really poses the problem with 4-byte Han Ideographs. Actually, you want be able to use them in queries (i.e. LIKE or WHERE clauses) - you will receive all records. That's how it works. And to support them, you would need to set very specific Chinese collation, which will simply break support for other languages.
Basically, using SQL Server 2005 with Chinese Ideographs is a bad idea.

like image 98
Paweł Dyda Avatar answered Oct 17 '22 18:10

Paweł Dyda


First off, I wonder if you are you sure that you picked the right culture identifier with zh-Hans, which is a neutral culture. Perhaps it would be more appropriate for you to target a specific culture, such as zh-CN (Chinese being used in China) if that is the market you are aiming to support.

Secondly, using the web.config file to set the culture is fine if you are planning a deployment that is exclusively targeting this culture. Often you'll want one same deployment to dynamically adapt to the end user's culture, in which case you would programmatically set the Thread.CurrentCulture (and even Thread.CurrentUICulture if you are providing localized resources) based for example on a URL scheme (e.g. www.myapp.com would use en-US and www.myapp.com/china would use zh-CN) or the accept-languages header or an in-app language selector.

Other than the Unicode limitations that Paweł refers to (which mean that you may really need to use the latest .NET Framework/SQL Server), there isn't anything specific you should need to do for simplified Chinese -- if you follow standard internationalization guidelines you should be all set. Perhaps you should consider localizing (translating) your app into Chinese as part of this, by the way.

About SQL Server, Paweł's points seem pretty clear. That said, so long as you use nvarchar datatypes (Unicode) and you don't run queries on these columns or sort them based on these columns on the DB side, I'd be surprised if you had any issues on SQL Server 2005. So it really depends what you do with this data.

like image 41
Clafou Avatar answered Oct 17 '22 17:10

Clafou