Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ANSI vs SHIFT JIS vs UTF-8 in c#

I have been trying to figure the difference for quite sometime now. The issue is with a file that is in ANSI encoding has japanese characters like: ­‚È‚­‚Æ‚à1‚‚ÌINCREMENTs‚ª•K—v‚Å‚·. It equivalent in shift-jis is 少なくとも1つのINCREMENT行が必要です. which is expected to be in japanese.

I need to display these characters after reading from file(in ANSI) on a webpage. There are some other files in UTF-8 displaying characters right not seeing this. I am finding it difficult to figure out whats the difference and how do I change encoding to do right things here.. I use c# for reading this file and displaying it, I also need to write the string back into file if its modified on web. Any encoding and decoding schemas here?

like image 854
remo Avatar asked Apr 18 '12 12:04

remo


People also ask

Is UTF-8 the same as ANSI?

ANSI and UTF-8 are both encoding formats. ANSI is the common one byte format used to encode Latin alphabet; whereas, UTF-8 is a Unicode format of variable length (from 1 to 4 bytes) which can encode all possible characters.

Which is better ANSI or UTF-8?

UTF-8 is superior in every way to ANSI. There is no reason to choose ANSI over UTF-8 in creating new applications as all computers can decode it. The only reason to be using ANSI is when you are forced to run an old application that you do not have any replacement for.

Is ANSI a subset of UTF-8?

This is an expected behavior. While Notepad or other text editors may have ANSI and UTF-8 listed as distinct encoding types, ANSI is a subset of UTF-8.

Should I use UTF-8 or ASCII?

UTF-8 is but a single encoding of that standard, there are many more. UTF-16 being the most widely used as it is the native encoding for Windows. So, if you need to support anything beyond the 128 characters of the ASCII set, my advice is to go with UTF-8.


1 Answers

As far as code pages are concerned, "ANSI" (and Encoding.Default in .NET) basically just means "the non-Unicode codepage used by this system" - exactly what codepage that is, depends on how the system is configured, but on a Western European system, it's likely to be Windows-1252.

For the system where that text comes from, then "ANSI" would appear to mean Shift-JIS - so unless your system has the same code page, you'll need to tell your code to read the text as Shift-JIS.

Assuming you're reading the file with a StreamReader, there are various constructors that take an Encoding, so just grab a Shift-JIS encoding with Encoding.GetEncoding("shift_jis") or Encoding.GetEncoding(932) and use it to construct your StreamReader.

like image 73
Michael Madsen Avatar answered Sep 27 '22 21:09

Michael Madsen