Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String.Replace doesn't replace all matches

Why does line2 replaces only alternating half of occurrences?

    Dim line1 As String = "AAA|BBB|CCC|CCC|CCC|CCC|EEE|FFF"
    Dim line2 As String = "AAA|BBB|CCC|CCC|CCC|CCC|EEE|FFF"
    Dim line3 As String = "AAA|BBB|CCC|CCC|CCC|CCC|EEE|FFF"

    line1 = line1.Replace("CCC", "")
    line2 = line2.Replace("|CCC|", "||")
    line3 = line3.Replace("CCC|", "|")

Result:

line1 = "AAA|BBB|||||EEE|FFF" -- OK, but fails when element is "..|ZZZCCCZZZ|.."
line2 = "AAA|BBB||CCC||CCC|EEE|FFF" -- Not OK
line3 = "AAA|BBB|||||EEE|FFF" -- OK, but fails similar to Line1 edge-case for "..|ZZZCCC|.."

I have tried using RegEx, but get similar results.

Is there really no better way than this, below?

Do While line1.Contains("|CCC|")
    line1 = line1.Replace("|CCC|", "||")
Loop
like image 608
Leon Avatar asked Feb 05 '13 17:02

Leon


3 Answers

Once it finds the first token, it starts looking for the next one after that token. So it finds |CCC|, replaces it, then continues on and the first thing it sees is CCC| which doesn't match. It doesn't pre-scan the string looking for tokens to replace.

Consider it like this:

Given AAA|BBB|CCC|CCC|CCC|CCC|EEE|FFF

It runs to AAA|BBB|CCC| HOLD IT |CCC| was found, let's start building our string:

AAA|BBB + || (our replacement)

Now let's move on, we now have CCC|CCC|CCC|EEE|FFF left to work with.

It runs to CCC|CCC| HOLD IT |CCC| was found, let's continue adding to our string:

AAA|BBB||CCC + || (our replacement)

Now let's move on, we now have CCC|CCC|EEE|FFF and so on and so on.

EDIT: Considering the entry on MSDN describing the return value:

A string that is equivalent to the current string except that all instances of oldValue are replaced with newValue.

One could read that as what you expect that it pre-scans the string and finds all matches. I don't see anything in the MSDN doc that describes this corner case. Perhaps this is something that should be added to the MSDN doc.

like image 132
Chris Sinclair Avatar answered Oct 18 '22 03:10

Chris Sinclair


Instead of using regular expressions or string.Replace you could parse the values, filter the ones you don't want and join them back together.

line1 = string.Join("|", line1.Split("|").Select(s => s == "CCC" ? "" : s).ToArray());

Sorry I don't know the VB equivalent.

like image 34
juharr Avatar answered Oct 18 '22 01:10

juharr


For anyone in the future, I've added an extension method to overcome this limitation in the framework:

<System.Runtime.CompilerServices.Extension()>
Public Function ReplaceAll(ByVal original As String, ByVal oldValue As String, ByVal newValue As String) As String

    If newValue.Contains(oldValue) Then
        Throw New ArgumentException("New value can't be a subset of OldValue as infinite replacements can occur.", newValue)
    End If

    Dim maxIterations As Integer = original.Length \ oldValue.Length

    While maxIterations > 0 AndAlso original.Contains(oldValue)
        original = original.Replace(oldValue, newValue)
        maxIterations -= 1
    End While

    Return original

End Function
like image 1
Leon Avatar answered Oct 18 '22 01:10

Leon