Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting zenkaku characters to hankaku and vice-versa in C#

As it says in the header line, I want to convert zenkaku characters to hankaku ones and vice-vrsa in C#, but can't figure out how to do it. So, say "ラーメン" to "ラーメン" and the other way around. Would it be possible to write this in a method which determines automatically which way the conversion needs to go, based on the format of the input?

like image 234
yu_ominae Avatar asked Jun 22 '11 02:06

yu_ominae


2 Answers

You can use the Strings.StrConv() method by including a reference to Microsoft.VisualBasic.dll, or you can p/invoke the LCMapString() native function:

private const uint LOCALE_SYSTEM_DEFAULT = 0x0800;
private const uint LCMAP_HALFWIDTH = 0x00400000;

public static string ToHalfWidth(string fullWidth)
{
    StringBuilder sb = new StringBuilder(256);
    LCMapString(LOCALE_SYSTEM_DEFAULT, LCMAP_HALFWIDTH, fullWidth, -1, sb, sb.Capacity);
    return sb.ToString();
}

[DllImport("kernel32.dll", CharSet = CharSet.Unicode)]
private static extern int LCMapString(uint Locale, uint dwMapFlags, string lpSrcStr, int cchSrc, StringBuilder lpDestStr, int cchDest);

and you can also do the reverse:

private const uint LCMAP_FULLWIDTH = 0x00800000;

public static string ToFullWidth(string halfWidth)
{
    StringBuilder sb = new StringBuilder(256);
    LCMapString(LOCALE_SYSTEM_DEFAULT, LCMAP_FULLWIDTH, halfWidth, -1, sb, sb.Capacity);
    return sb.ToString();
}

As for detecting the format of the input string, I'm not aware of an easy way without doing a conversion first and comparing results. (What if the string contains both full-width and half-width characters?)

like image 159
John Estropia Avatar answered Oct 16 '22 00:10

John Estropia


One approach is to compile a list of all characters you want to convert and how they map to each other, and then iterate the input string and replace all characters in the list with their equivalent.

var fullToHalf = new Dictionary<char, char>
{
    ...
    { '\u30E9', '\uFF97' }, // KATAKANA LETTER RA -> HALFWIDTH KATAKANA LETTER RA
    { '\u30EA', '\uFF98' }, // KATAKANA LETTER RI -> HALFWIDTH KATAKANA LETTER RI
    ...
};

var halfToFull = fullToHalf.ToDictionary(kv => kv.Value, kv => kv.Key);

var input = "\u30E9";

var isFullWidth = input.All(ch => fullToHalf.ContainsKey(ch));
var isHalfWidth = input.All(ch => halfToFull.ContainsKey(ch));

var result = new string(input.Select(ch => fullToHalf[ch]).ToArray());
// result == "\uFF97"

Unicode Chart: Halfwidth and Fullwidth Forms (FF00-FFEF)

like image 42
dtb Avatar answered Oct 16 '22 00:10

dtb