Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manipulating a String: Removing special characters - Change all accented letters to non accented

Tags:

c#

regex

I'm using asp.net 4 and c#.

I have a string that can contains:

  • Special Characters, like: !"£$%&/()/#
  • Accented letters, like: àòèù
  • Empty spaces, like: " "(1 consecutive or more),

Example string:

#Hi this          is  rèally/ special strìng!!!

I would like to:

a) Remove all Special Characters, like:

Hi this          is  rèally special strìng

b) Convert all Accented letters to NON Accented letters, like:

Hi this          is  really special string

c) Remove all Empty spaces and replace theme with a dash (-), like:

Hi-this-is-really-special-string

My aim is to creating a string suitable for URL path for better SEO.

Any idea how to do it with Regular Expression or another techniques?

Thanks for your help on this!

like image 486
GibboK Avatar asked Dec 17 '22 10:12

GibboK


2 Answers

Similar to mathieu's answer, but more custom made for you requirements. This solution first strips special characters and diacritics from the input string, and then replaces whitespace with dashes:

string s = "#Hi this          is  rèally/ special strìng!!!";
string normalized = s.Normalize(NormalizationForm.FormD);


StringBuilder resultBuilder = new StringBuilder();
foreach (var character in normalized)
{
    UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(character);
    if (category == UnicodeCategory.LowercaseLetter
        || category == UnicodeCategory.UppercaseLetter
        || category == UnicodeCategory.SpaceSeparator)
        resultBuilder.Append(character);
}
string result = Regex.Replace(resultBuilder.ToString(), @"\s+", "-");

See it in action at ideone.com.

like image 62
Jens Avatar answered May 04 '23 01:05

Jens


You should have a look a this answer : Ignoring accented letters in string comparison

Code here :

static string RemoveDiacritics(string sIn)
{
  string sFormD = sIn.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  foreach (char ch in sFormD)
  {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
    if (uc != UnicodeCategory.NonSpacingMark)
    {
      sb.Append(ch);
    }
  }

  return (sb.ToString().Normalize(NormalizationForm.FormC));
}
like image 34
mathieu Avatar answered May 04 '23 00:05

mathieu