Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace a list of invalid character with their valid version (like tr)

Tags:

string

c#

I need to do something like this dreamed .trReplace:

  str = str.trReplace("áéíüñ","aeiu&");

It should change this string:

  a stríng with inválid charactérs

to:

  a string with invalid characters

My current ideas are:

 str = str.Replace("á","a").Replace("é","e").Replace("í","ï"...

and:

 sb = new StringBuilder(str)
 sb.Replace("á","a").
 sb.Replace("é","e")
 sb.Replace("í","ï"...

But I don't think they are efficient for long strings.

like image 721
MiguelM Avatar asked May 30 '11 00:05

MiguelM


1 Answers

Richard has a good answer, but performance may suffer slightly on longer strings (about 25% slower than straight string replace as shown in question). I felt complelled to look in to this a little further. There are actually several good related answers already on StackOverflow as captured below:

Fastest way to remove chars from string

C# Stripping / converting one or more characters

There is also a good article on the CodeProject covering the different options.

http://www.codeproject.com/KB/string/fastestcscaseinsstringrep.aspx

To explain why the function provided in Richards answer gets slower with longer strings is due to the fact that the replacements are happening one character at a time; thus if you have large sequences of non-mapped characters, you are wasting extra cycles while re-appending together the string . As such, if you want to take a few points from the CodePlex Article you end up with a slightly modified version of Richards answer that looks like:

private static readonly Char[] ReplacementChars = new[] { 'á', 'é', 'í', 'ü', 'ñ' };
private static readonly Dictionary<Char, Char> ReplacementMappings = new Dictionary<Char, Char>
                                                               {
                                                                 { 'á', 'a'},
                                                                 { 'é', 'e'},
                                                                 { 'í', 'i'},
                                                                 { 'ü', 'u'},
                                                                 { 'ñ', '&'}
                                                               };

private static string Translate(String source)
{
  var startIndex = 0;
  var currentIndex = 0;
  var result = new StringBuilder(source.Length);

  while ((currentIndex = source.IndexOfAny(ReplacementChars, startIndex)) != -1)
  {
    result.Append(source.Substring(startIndex, currentIndex - startIndex));
    result.Append(ReplacementMappings[source[currentIndex]]);

    startIndex = currentIndex + 1;
  }

  if (startIndex == 0)
    return source;

  result.Append(source.Substring(startIndex));

  return result.ToString();
}

NOTE Not all edge cases have been tested.

NOTE Could replace ReplacementChars with ReplacementMappings.Keys.ToArray() for a slight cost.

Assuming that NOT every character is a replacement char, then this will actually run slightly faster than straigt string replacements (again about 20%).

That being said, remember when considering performance cost, what we are actually talking about... in this case... the difference between the optimized solution and original solution is about 1 second over 100,000 iterations on a 1,000 character string.

Either way, just wanted to add some information to the answers for this question.

like image 165
Chris Baxter Avatar answered Oct 18 '22 04:10

Chris Baxter