Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it more or less efficient to perform a check before performing a Replace in C#?

Tags:

c#

text

replace

This is an almost academic question but I'm curious as to its answer.

Suppose you have a loop that performs a routine replace on every row in a dataset. Let's say there's 10,000 such rows.

Is it more efficient to have something like this:

 Row = Row.Replace('X', 'Y');

Or to check whether the row even contains the character that is to be replaced in the first place, like this:

 if (Row.Contains('X')) Row = Row.Replace('X', 'Y');

Is there any difference in terms of efficiency? I realize that that the difference might be very minor bit I'm interested in knowing if one way is better than the other regardless of how much better it may be. Also, would your answer be different if the probability of finding the character that's to be replaced was 10% from it it being 90%?

like image 224
GonzoKnight Avatar asked Jul 15 '11 15:07

GonzoKnight


People also ask

Is replace function case sensitive?

replace method is case sensitive.

Does replace create new string?

Replace(Char, Char)Returns a new string in which all occurrences of a specified Unicode character in this instance are replaced with another specified Unicode character.

How to Replace a word in a string in C#?

C# | Replace() Method In C#, Replace() method is a string method. This method is used to replace all the specified Unicode characters or specified string from the current string object and returns a new modified string. This method can be overloaded by passing arguments to it.


2 Answers

For your check, Row.Contains('X'), is an O(n) function, which means that it iterates over the entire string one character at a time to see if that character exists.

Row.Replace('X', 'Y') works exactly the same way, it checks every single character one character at a time.

So, if you have that check in place, you iterate over the string potentially twice. If you just replace, you iterate over the string once.

like image 174
Mike Richards Avatar answered Oct 20 '22 23:10

Mike Richards


You need to measure first on a realistic dataset, then decide which is higher performance. If your typical dataset doesn't often have anything, then having the Contains() call may be faster (because although Replace also iterates through all chars in the string, there will be an extra string object created and garbage collected due to the immutability of strings), but if "X" is often present, the check becomes a waste and actually slows things down.

Also, this typically isn't the first place to look for and worry about performance problems. Things like chatty interfaces, network I/O, web services, databases, file I/O and GUI updates are going to hurt you orders of magnitude more than stuff like this.

If you were going to do stuff like this, and if Row came back from a database (as it's name suggests) then getting the database to do the query might be another approach to save performance. E.g.

select MyTextColumn from MyTable where MyTextColumn like '%X%'

Then perform the replacement on all the results, because you know you only returned results where the replacement was needed.

This does introduce other concerns though - for example, in SQL Server, if the above example included an index on MyTextColumn, SQL Server won't be able to use that index because the like argument starts with a wildcard (it's not considered to be "sargable").

In summary, write for correctness, readability and maintenance first, then measure performance and make targeted improvements where they are found to be required.

like image 26
Neil Barnwell Avatar answered Oct 20 '22 22:10

Neil Barnwell