Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change all accented letters to normal letters in C++

Tags:

c++

c

string

The question

How can you change all accented letters to normal letters in C++ (or in C)?

By that, I mean something like eéèêaàäâçc would become eeeeaaaacc.

What I've already tried

I've tried just parsing the string manually and replacing each one of them one by one, but I was thinking there has to be a better/simpler way that I am not aware of (that would garantee I do not forget any accented letter).

I am wondering if there is already a map somewhere in the standard library or if all the accented characters can easily be mapped to the "normal" letter using some mathematic function (ex. floor(charCode-131/5) + 61)).

like image 980
OneMore Avatar asked Dec 30 '12 21:12

OneMore


People also ask

How do you change an accented character to a regular character?

replace(/[^a-z0-9]/gi,'') . However a more intuitive solution (at least for the user) would be to replace accented characters with their "plain" equivalent, e.g. turn á , á into a , and ç into c , etc.

How do you normalize a special character in Java?

Use java. text. Normalizer to handle this for you. This will separate all of the accent marks from the characters.


2 Answers

char* removeAccented( char* str ) {
    char *p = str;
    while ( (*p)!=0 ) {
        const char*
        //   "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"
        tr = "AAAAAAECEEEEIIIIDNOOOOOx0UUUUYPsaaaaaaeceeeeiiiiOnooooo/0uuuuypy";
        unsigned char ch = (*p);
        if ( ch >=192 ) {
            (*p) = tr[ ch-192 ];
        }
        ++p; // http://stackoverflow.com/questions/14094621/
    }
    return str;
}
like image 130
Adolfo Avatar answered Sep 30 '22 16:09

Adolfo


You should first define what you mean by "accented letters" what has to be done is largely different if what you have is say some extended 8 bits ASCII with a national codepage for codes above 128, or say some utf8 encoded string.

However you should have a look at libicu which provide what is necessary for good unicode based accented letters manipulation.

But it won't solve all problems for you. For instance what should you do if you get some chinese or russian letter ? What should you do if you get the Turkish uppercase I with point ? Remove the point on this "I" ? Doing so it would change the meaning of the text... etc. This kind of problems are endless with unicode. Even conventional sorting order depends of the country...

like image 30
kriss Avatar answered Sep 30 '22 16:09

kriss