Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get an "English name" for a character?

I was just using this most helpful link: How do I check if a given string is a legal / valid file name under Windows?

And inside some validate code I have something that looks like (ignore the fact that I'm not using a StringBuilder class and ignore the bug in forming the message (don't need to tell them about 'Colon' more than once if it shows up in the string more than once)):

string InvalidFileNameChars = new string(Path.GetInvalidFileNameChars());
Regex ContainsABadChar = new Regex("[" + Regex.Escape(InvalidFileNameChars) + "]");

MatchCollection BadChars = ContainsABadChar.Matches(txtFileName.Text);
if (BadChars.Count > 0)
{
    string Msg = "The following invalid characters were detected:\r\n\r\n";
    foreach (Match Bad in BadChars)
    {
        Msg += Bad.Value + "\r\n";
    }
    MessageBox.Show(Msg, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
    return;
}

That MessageBox will look something like (using the example that a colon was found):

-- begin --

The following invalid characters are detected:

:

-- end --

I'd like it to say something like:

-- begin --

The following invalid characters are detected:

Colon -> :

-- end --

I like having the english name. Not a killer, but was curious if there's some function out there like (which doesn't exist for the Char class, but may exist in some other class I'm not thinking of):

Char.GetEnglishName(':');

like image 337
JustLooking Avatar asked Jan 19 '12 18:01

JustLooking


People also ask

Can you give me an English name?

Along with Audrey and Jack, classic English names in the US Top 300 include Georgia, Juliet, Lily and Olive for girls, and Emmett, Miles, Oscar, and William for boy. In the UK, popular English names include Alfie, Lily, Edward, and Ella.

Why do Chinese give themselves English names?

Chinese people began to give themselves English names after the Reform and Opening Up in the late 1970s, when they were exposed increasingly to western cultures. For many, English names are informal haos that represent another layer of identity.


2 Answers

I compiled a dictionary of character names that I gathered from various sources for a personal tool I made to search through unicode characters: http://jumpingfishes.com/unicodechars.htm

The dictionary is expressed as a JavaScript array and contains 20,761 definitions. Feel free to borrow my JavaScript to create a C# dictionary:
http://jumpingfishes.com/unicodeDescriptions.js

Edit: Better yet, here's the text file I used to generate my JavaScript. This might be a little easier source to parse for generating a C# dictionary. It contains the character code in hex followed by a tab followed by the character description.
http://jumpingfishes.com/unicodeDictionary.txt

like image 105
gilly3 Avatar answered Sep 27 '22 21:09

gilly3


You can just use the basic latin and controls unicode block if you don't need to account for every character, ever.

You can define the table as a simple string array to make lookups fast:

string[] lookup = new string[128];
lookup[0x00]="Null character";
lookup[0x01]="Start of Heading";
lookup[0x02]="Start of Text";
lookup[0x03]="End-of-text character";
lookup[0x04]="End-of-transmission character";
lookup[0x05]="Enquiry character";
lookup[0x06]="Acknowledge character";
lookup[0x07]="Bell character";
lookup[0x08]="Backspace";
lookup[0x09]="Horizontal tab";
lookup[0x0A]="Line feed";
lookup[0x0B]="Vertical tab";
lookup[0x0C]="Form feed";
lookup[0x0D]="Carriage return";
lookup[0x0E]="Shift Out";
lookup[0x0F]="Shift In";
lookup[0x10]="Data Link Escape";
lookup[0x11]="Device Control 1";
lookup[0x12]="Device Control 2";
lookup[0x13]="Device Control 3";
lookup[0x14]="Device Control 4";
lookup[0x15]="Negative-acknowledge character";
lookup[0x16]="Synchronous Idle";
lookup[0x17]="End of Transmission Block";
lookup[0x18]="Cancel character";
lookup[0x19]="End of Medium";
lookup[0x1A]="Substitute character";
lookup[0x1B]="Escape character";
lookup[0x1C]="File Separator";
lookup[0x1D]="Group Separator";
lookup[0x1E]="Record Separator";
lookup[0x1F]="Unit Separator";
lookup[0x20]="Space";
lookup[0x21]="Exclamation mark";
lookup[0x22]="Quotation mark";
lookup[0x23]="Number sign";
lookup[0x24]="Dollar sign";
lookup[0x25]="Percent sign";
lookup[0x26]="Ampersand";
lookup[0x27]="Apostrophe";
lookup[0x28]="Left parenthesis";
lookup[0x29]="Right parenthesis";
lookup[0x2A]="Asterisk";
lookup[0x2B]="Plus sign";
lookup[0x2C]="Comma";
lookup[0x2D]="Hyphen-minus";
lookup[0x2E]="Full stop";
lookup[0x2F]="Slash";
lookup[0x30]="Digit Zero";
lookup[0x31]="Digit One";
lookup[0x32]="Digit Two";
lookup[0x33]="Digit Three";
lookup[0x34]="Digit Four";
lookup[0x35]="Digit Five";
lookup[0x36]="Digit Six";
lookup[0x37]="Digit Seven";
lookup[0x38]="Digit Eight";
lookup[0x39]="Digit Nine";
lookup[0x3A]="Colon";
lookup[0x3B]="Semicolon";
lookup[0x3C]="Less-than sign";
lookup[0x3D]="Equal sign";
lookup[0x3E]="Greater-than sign";
lookup[0x3F]="Question mark";
lookup[0x40]="At sign";
lookup[0x41]="Latin Capital letter A";
lookup[0x42]="Latin Capital letter B";
lookup[0x43]="Latin Capital letter C";
lookup[0x44]="Latin Capital letter D";
lookup[0x45]="Latin Capital letter E";
lookup[0x46]="Latin Capital letter F";
lookup[0x47]="Latin Capital letter G";
lookup[0x48]="Latin Capital letter H";
lookup[0x49]="Latin Capital letter I";
lookup[0x4A]="Latin Capital letter J";
lookup[0x4B]="Latin Capital letter K";
lookup[0x4C]="Latin Capital letter L";
lookup[0x4D]="Latin Capital letter M";
lookup[0x4E]="Latin Capital letter N";
lookup[0x4F]="Latin Capital letter O";
lookup[0x50]="Latin Capital letter P";
lookup[0x51]="Latin Capital letter Q";
lookup[0x52]="Latin Capital letter R";
lookup[0x53]="Latin Capital letter S";
lookup[0x54]="Latin Capital letter T";
lookup[0x55]="Latin Capital letter U";
lookup[0x56]="Latin Capital letter V";
lookup[0x57]="Latin Capital letter W";
lookup[0x58]="Latin Capital letter X";
lookup[0x59]="Latin Capital letter Y";
lookup[0x5A]="Latin Capital letter Z";
lookup[0x5B]="Left Square Bracket";
lookup[0x5C]="Backslash";
lookup[0x5D]="Right Square Bracket";
lookup[0x5E]="Circumflex accent";
lookup[0x5F]="Low line";
lookup[0x60]="Grave accent";
lookup[0x61]="Latin Small Letter A";
lookup[0x62]="Latin Small Letter B";
lookup[0x63]="Latin Small Letter C";
lookup[0x64]="Latin Small Letter D";
lookup[0x65]="Latin Small Letter E";
lookup[0x66]="Latin Small Letter F";
lookup[0x67]="Latin Small Letter G";
lookup[0x68]="Latin Small Letter H";
lookup[0x69]="Latin Small Letter I";
lookup[0x6A]="Latin Small Letter J";
lookup[0x6B]="Latin Small Letter K";
lookup[0x6C]="Latin Small Letter L";
lookup[0x6D]="Latin Small Letter M";
lookup[0x6E]="Latin Small Letter N";
lookup[0x6F]="Latin Small Letter O";
lookup[0x70]="Latin Small Letter P";
lookup[0x71]="Latin Small Letter Q";
lookup[0x72]="Latin Small Letter R";
lookup[0x73]="Latin Small Letter S";
lookup[0x74]="Latin Small Letter T";
lookup[0x75]="Latin Small Letter U";
lookup[0x76]="Latin Small Letter V";
lookup[0x77]="Latin Small Letter W";
lookup[0x78]="Latin Small Letter X";
lookup[0x79]="Latin Small Letter Y";
lookup[0x7A]="Latin Small Letter Z";
lookup[0x7B]="Left Curly Bracket";
lookup[0x7C]="Vertical bar";
lookup[0x7D]="Right Curly Bracket";
lookup[0x7E]="Tilde";
lookup[0x7F]="Delete";

Then, all you need to do is:

var englishName = lookup[(int)'~'];

Or:

 public static string ToEnglishName(this char c)
 {
    int i = (int)c;
    if( i < lookup.Length )
       return lookup[i];
    return "Unknown";
 }

 var name = ':'.ToEnglishName(); // Colon
like image 43
Ryan Emerle Avatar answered Sep 27 '22 20:09

Ryan Emerle