Removing accents from a QString [duplicate]

Name: How to remove accent from Names in excel at once|Excel Tips & Tricks
Uploaded: 2022-09-15 15:05:05
Description: Removing accents from a QString [duplicate]I want to remove accents and more generally diacritic marks from a string to initiate

Question

I want to remove accents and more generally diacritic marks from a string to initiate an accent-insensitive search. Based on some reading on Unicode character classes, I've come up with this:

 QString unaccent(const QString s)
 {
   QString s2 = s.normalized(QString::NormalizationForm_D);
   QString out;
   for (int i=0,j=s2.length(); i<j; i++)
   {
     // strip diacritic marks
     if (s2.at(i).category()!=QChar::Mark_NonSpacing &&
         s2.at(i).category()!=QChar::Mark_SpacingCombining)
     {
          out.append(s2.at(i));
     }
   }
   return out;
 }

It appears to work reasonably well for latin-based languages, but I'm wondering about its adequacy on other alphabets: arabic, cyrillic, CJK... which I cannot test due to lack of cultural understanding of these.

Specifically I wish I'd know:

What Unicode normalization form is better suited for this problem: NormalizationForm_KD or NormalizationForm_D?
Is it sufficient to remove the characters belonging to Mark_NonSpacing and Mark_SpacingCombining categories or should it include more categories?
Are there other improvements to the above code that would make it work as best as possible for all languages?

Heitor · Accepted Answer

QString unaccent(const QString s)
{
    QString output(s.normalized(QString::NormalizationForm_D));
    return output.replace(QRegExp("[^a-zA-Z\s]"), "");
}

Removing accents from a QString [duplicate]

Tags:

unicode

qt

Daniel Vérité

Video Answer

1 Answers

Heitor

Recent Activity

Donate For Us

Removing accents from a QString [duplicate]

Tags:

unicode

qt

Daniel Vérité

Video Answer

1 Answers

Heitor

Related questions

Recent Activity

Donate For Us