I'm calling a REST API and am receiving an XML response back. It returns a list of a workspace names, and I'm writing a quick IsExistingWorkspace()
method. Since all workspaces consist of contiguous characters with no whitespace, I'm assuming the easiest way to find out if a particular workspace is in the list is to remove all whitespace (including newlines) and doing this (XML is the string received from the web request):
XML.Contains("<name>" + workspaceName + "</name>");
I know it's case-sensitive, and I'm relying on that. I just need a way to remove all whitespace in a string efficiently. I know RegEx and LINQ can do it, but I'm open to other ideas. I am mostly just concerned about speed.
strip() Python String strip() function will remove leading and trailing whitespaces. If you want to remove only leading or trailing spaces, use lstrip() or rstrip() function instead.
This is fastest way I know of, even though you said you didn't want to use regular expressions:
Regex.Replace(XML, @"\s+", "");
Crediting @hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.
private static readonly Regex sWhitespace = new Regex(@"\s+");
public static string ReplaceWhitespace(string input, string replacement)
{
return sWhitespace.Replace(input, replacement);
}
I have an alternative way without regexp, and it seems to perform pretty good. It is a continuation on Brandon Moretz answer:
public static string RemoveWhitespace(this string input)
{
return new string(input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.ToArray());
}
I tested it in a simple unit test:
[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace1(string input, string expected)
{
string s = null;
for (int i = 0; i < 1000000; i++)
{
s = input.RemoveWhitespace();
}
Assert.AreEqual(expected, s);
}
[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace2(string input, string expected)
{
string s = null;
for (int i = 0; i < 1000000; i++)
{
s = Regex.Replace(input, @"\s+", "");
}
Assert.AreEqual(expected, s);
}
For 1,000,000 attempts the first option (without regexp) runs in less then a second (700 ms on my machine), and the second takes 3.5 seconds.
Try the replace method of the string in C#.
XML.Replace(" ", string.Empty);
My solution is to use Split and Join and it is surprisingly fast, in fact the fastest of the top answers here.
str = string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
Timings for 10,000 loop on simple string with whitespace inc new lines and tabs
Improve this by wrapping it up in method to give it meaning, and also make it an extension method while we are at it ...
public static string RemoveWhitespace(this string str) {
return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}
Building on Henks answer I have created some test methods with his answer and some added, more optimized, methods. I found the results differ based on the size of the input string. Therefore, I have tested with two result sets. In the fastest method, the linked source has a even faster way. But, since it is characterized as unsafe I have left this out.
Long input string results:
Short input string results:
Code:
public class RemoveWhitespace
{
public static string RemoveStringReader(string input)
{
var s = new StringBuilder(input.Length); // (input.Length);
using (var reader = new StringReader(input))
{
int i = 0;
char c;
for (; i < input.Length; i++)
{
c = (char)reader.Read();
if (!char.IsWhiteSpace(c))
{
s.Append(c);
}
}
}
return s.ToString();
}
public static string RemoveLinqNativeCharIsWhitespace(string input)
{
return new string(input.ToCharArray()
.Where(c => !char.IsWhiteSpace(c))
.ToArray());
}
public static string RemoveLinq(string input)
{
return new string(input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.ToArray());
}
public static string RemoveRegex(string input)
{
return Regex.Replace(input, @"\s+", "");
}
private static Regex compiled = new Regex(@"\s+", RegexOptions.Compiled);
public static string RemoveRegexCompiled(string input)
{
return compiled.Replace(input, "");
}
public static string RemoveForLoop(string input)
{
for (int i = input.Length - 1; i >= 0; i--)
{
if (char.IsWhiteSpace(input[i]))
{
input = input.Remove(i, 1);
}
}
return input;
}
public static string StringSplitThenJoin(this string str)
{
return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}
public static string RemoveInPlaceCharArray(string input)
{
var len = input.Length;
var src = input.ToCharArray();
int dstIdx = 0;
for (int i = 0; i < len; i++)
{
var ch = src[i];
switch (ch)
{
case '\u0020':
case '\u00A0':
case '\u1680':
case '\u2000':
case '\u2001':
case '\u2002':
case '\u2003':
case '\u2004':
case '\u2005':
case '\u2006':
case '\u2007':
case '\u2008':
case '\u2009':
case '\u200A':
case '\u202F':
case '\u205F':
case '\u3000':
case '\u2028':
case '\u2029':
case '\u0009':
case '\u000A':
case '\u000B':
case '\u000C':
case '\u000D':
case '\u0085':
continue;
default:
src[dstIdx++] = ch;
break;
}
}
return new string(src, 0, dstIdx);
}
}
Tests:
[TestFixture]
public class Test
{
// Short input
//private const string input = "123 123 \t 1adc \n 222";
//private const string expected = "1231231adc222";
// Long input
private const string input = "123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222";
private const string expected = "1231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc222";
private const int iterations = 1000000;
[Test]
public void RemoveInPlaceCharArray()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveInPlaceCharArray(input);
}
stopwatch.Stop();
Console.WriteLine("InPlaceCharArray: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveStringReader()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveStringReader(input);
}
stopwatch.Stop();
Console.WriteLine("String reader: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveLinqNativeCharIsWhitespace()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveLinqNativeCharIsWhitespace(input);
}
stopwatch.Stop();
Console.WriteLine("LINQ using native char.IsWhitespace: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveLinq()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveLinq(input);
}
stopwatch.Stop();
Console.WriteLine("LINQ: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveRegex()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveRegex(input);
}
stopwatch.Stop();
Console.WriteLine("Regex: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveRegexCompiled()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveRegexCompiled(input);
}
stopwatch.Stop();
Console.WriteLine("RegexCompiled: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[Test]
public void RemoveForLoop()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.RemoveForLoop(input);
}
stopwatch.Stop();
Console.WriteLine("ForLoop: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
[TestMethod]
public void StringSplitThenJoin()
{
string s = null;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
s = RemoveWhitespace.StringSplitThenJoin(input);
}
stopwatch.Stop();
Console.WriteLine("StringSplitThenJoin: " + stopwatch.ElapsedMilliseconds + " ms");
Assert.AreEqual(expected, s);
}
}
Edit: Tested a nice one liner from Kernowcode.
Just an alternative because it looks quite nice :) - NOTE: Henks answer is the quickest of these.
input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.Select(c => c.ToString())
.Aggregate((a, b) => a + b);
Testing 1,000,000 loops on "This is a simple Test"
This method = 1.74 seconds
Regex = 2.58 secondsnew String
(Henks) = 0.82 seconds
I found a nice write-up on this on CodeProject by Felipe Machado (with help by Richard Robertson)
He tested ten different methods. This one is the fastest safe version...
public static string TrimAllWithInplaceCharArray(string str) {
var len = str.Length;
var src = str.ToCharArray();
int dstIdx = 0;
for (int i = 0; i < len; i++) {
var ch = src[i];
switch (ch) {
case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
continue;
default:
src[dstIdx++] = ch;
break;
}
}
return new string(src, 0, dstIdx);
}
And the fastest unsafe version... (some inprovements by Sunsetquest 5/26/2021 )
public static unsafe void RemoveAllWhitespace(ref string str)
{
fixed (char* pfixed = str)
{
char* dst = pfixed;
for (char* p = pfixed; *p != 0; p++)
{
switch (*p)
{
case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
continue;
default:
*dst++ = *p;
break;
}
}
uint* pi = (uint*)pfixed;
ulong len = ((ulong)dst - (ulong)pfixed) >> 1;
pi[-1] = (uint)len;
pfixed[len] = '\0';
}
}
There are also some nice independent benchmarks on Stack Overflow by Stian Standahl that also show how Felipe's function is about 300% faster than the next fastest function. Also, for the one I modified, I used this trick.
If you need superb performance, you should avoid LINQ and regular expressions in this case. I did some performance benchmarking, and it seems that if you want to strip white space from beginning and end of the string, string.Trim() is your ultimate function.
If you need to strip all white spaces from a string, the following method works fastest of all that has been posted here:
public static string RemoveWhitespace(this string input)
{
int j = 0, inputlen = input.Length;
char[] newarr = new char[inputlen];
for (int i = 0; i < inputlen; ++i)
{
char tmp = input[i];
if (!char.IsWhiteSpace(tmp))
{
newarr[j] = tmp;
++j;
}
}
return new String(newarr, 0, j);
}
Regex is overkill; just use extension on string (thanks Henk). This is trivial and should have been part of the framework. Anyhow, here's my implementation:
public static partial class Extension
{
public static string RemoveWhiteSpace(this string self)
{
return new string(self.Where(c => !Char.IsWhiteSpace(c)).ToArray());
}
}
Here is a simple linear alternative to the RegEx solution. I am not sure which is faster; you'd have to benchmark it.
static string RemoveWhitespace(string input)
{
StringBuilder output = new StringBuilder(input.Length);
for (int index = 0; index < input.Length; index++)
{
if (!Char.IsWhiteSpace(input, index))
{
output.Append(input[index]);
}
}
return output.ToString();
}
I needed to replace white space in a string with spaces, but not duplicate spaces. e.g., I needed to convert something like the following:
"a b c\r\n d\t\t\t e"
to
"a b c d e"
I used the following method
private static string RemoveWhiteSpace(string value)
{
if (value == null) { return null; }
var sb = new StringBuilder();
var lastCharWs = false;
foreach (var c in value)
{
if (char.IsWhiteSpace(c))
{
if (lastCharWs) { continue; }
sb.Append(' ');
lastCharWs = true;
}
else
{
sb.Append(c);
lastCharWs = false;
}
}
return sb.ToString();
}
I think alot of persons come here for removing spaces. :
string s = "my string is nice";
s = s.replace(" ", "");
I assume your XML response looks like this:
var xml = @"<names>
<name>
foo
</name>
<name>
bar
</name>
</names>";
The best way to process XML is to use an XML parser, such as LINQ to XML:
var doc = XDocument.Parse(xml);
var containsFoo = doc.Root
.Elements("name")
.Any(e => ((string)e).Trim() == "foo");
We can use:
public static string RemoveWhitespace(this string input)
{
if (input == null)
return null;
return new string(input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.ToArray());
}
Using Linq, you can write a readable method this way :
public static string RemoveAllWhitespaces(this string source)
{
return string.IsNullOrEmpty(source) ? source : new string(source.Where(x => !char.IsWhiteSpace(x)).ToArray());
}
Here is yet another variant:
public static string RemoveAllWhitespace(string aString)
{
return String.Join(String.Empty, aString.Where(aChar => aChar !Char.IsWhiteSpace(aChar)));
}
As with most of the other solutions, I haven't performed exhaustive benchmark tests, but this works well enough for my purposes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With