Efficient way to remove ALL whitespace from String?

Tags:

I'm calling a REST API and am receiving an XML response back. It returns a list of a workspace names, and I'm writing a quick IsExistingWorkspace() method. Since all workspaces consist of contiguous characters with no whitespace, I'm assuming the easiest way to find out if a particular workspace is in the list is to remove all whitespace (including newlines) and doing this (XML is the string received from the web request):

XML.Contains("<name>" + workspaceName + "</name>");

I know it's case-sensitive, and I'm relying on that. I just need a way to remove all whitespace in a string efficiently. I know RegEx and LINQ can do it, but I'm open to other ideas. I am mostly just concerned about speed.

723

asked Jun 02 '11 19:06

Corey Ogburn

16 Answers

This is fastest way I know of, even though you said you didn't want to use regular expressions:

Regex.Replace(XML, @"\s+", "");

Crediting @hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.

private static readonly Regex sWhitespace = new Regex(@"\s+");
public static string ReplaceWhitespace(string input, string replacement) 
{
    return sWhitespace.Replace(input, replacement);
}

125

answered Oct 04 '22 09:10

slandau

I have an alternative way without regexp, and it seems to perform pretty good. It is a continuation on Brandon Moretz answer:

 public static string RemoveWhitespace(this string input)
 {
    return new string(input.ToCharArray()
        .Where(c => !Char.IsWhiteSpace(c))
        .ToArray());
 }

I tested it in a simple unit test:

[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace1(string input, string expected)
{
    string s = null;
    for (int i = 0; i < 1000000; i++)
    {
        s = input.RemoveWhitespace();
    }
    Assert.AreEqual(expected, s);
}

[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace2(string input, string expected)
{
    string s = null;
    for (int i = 0; i < 1000000; i++)
    {
        s = Regex.Replace(input, @"\s+", "");
    }
    Assert.AreEqual(expected, s);
}

For 1,000,000 attempts the first option (without regexp) runs in less then a second (700 ms on my machine), and the second takes 3.5 seconds.

answered Oct 04 '22 07:10

Henk J Meulekamp

Try the replace method of the string in C#.

XML.Replace(" ", string.Empty);

answered Oct 04 '22 08:10

Mike_K

My solution is to use Split and Join and it is surprisingly fast, in fact the fastest of the top answers here.

str = string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));

Timings for 10,000 loop on simple string with whitespace inc new lines and tabs

split/join = 60 milliseconds
linq chararray = 94 milliseconds
regex = 437 milliseconds

Improve this by wrapping it up in method to give it meaning, and also make it an extension method while we are at it ...

public static string RemoveWhitespace(this string str) {
    return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}

answered Oct 04 '22 07:10

kernowcode

Building on Henks answer I have created some test methods with his answer and some added, more optimized, methods. I found the results differ based on the size of the input string. Therefore, I have tested with two result sets. In the fastest method, the linked source has a even faster way. But, since it is characterized as unsafe I have left this out.

Long input string results:

InPlaceCharArray: 2021 ms (Sunsetquest's answer) - (Original source)
String split then join: 4277ms (Kernowcode's answer)
String reader: 6082 ms
LINQ using native char.IsWhitespace: 7357 ms
LINQ: 7746 ms (Henk's answer)
ForLoop: 32320 ms
RegexCompiled: 37157 ms
Regex: 42940 ms

Short input string results:

InPlaceCharArray: 108 ms (Sunsetquest's answer) - (Original source)
String split then join: 294 ms (Kernowcode's answer)
String reader: 327 ms
ForLoop: 343 ms
LINQ using native char.IsWhitespace: 624 ms
LINQ: 645ms (Henk's answer)
RegexCompiled: 1671 ms
Regex: 2599 ms

Code:

public class RemoveWhitespace
{
    public static string RemoveStringReader(string input)
    {
        var s = new StringBuilder(input.Length); // (input.Length);
        using (var reader = new StringReader(input))
        {
            int i = 0;
            char c;
            for (; i < input.Length; i++)
            {
                c = (char)reader.Read();
                if (!char.IsWhiteSpace(c))
                {
                    s.Append(c);
                }
            }
        }

        return s.ToString();
    }

    public static string RemoveLinqNativeCharIsWhitespace(string input)
    {
        return new string(input.ToCharArray()
            .Where(c => !char.IsWhiteSpace(c))
            .ToArray());
    }

    public static string RemoveLinq(string input)
    {
        return new string(input.ToCharArray()
            .Where(c => !Char.IsWhiteSpace(c))
            .ToArray());
    }

    public static string RemoveRegex(string input)
    {
        return Regex.Replace(input, @"\s+", "");
    }

    private static Regex compiled = new Regex(@"\s+", RegexOptions.Compiled);
    public static string RemoveRegexCompiled(string input)
    {
        return compiled.Replace(input, "");
    }

    public static string RemoveForLoop(string input)
    {
        for (int i = input.Length - 1; i >= 0; i--)
        {
            if (char.IsWhiteSpace(input[i]))
            {
                input = input.Remove(i, 1);
            }
        }
        return input;
    }

    public static string StringSplitThenJoin(this string str)
    {
        return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
    }

    public static string RemoveInPlaceCharArray(string input)
    {
        var len = input.Length;
        var src = input.ToCharArray();
        int dstIdx = 0;
        for (int i = 0; i < len; i++)
        {
            var ch = src[i];
            switch (ch)
            {
                case '\u0020':
                case '\u00A0':
                case '\u1680':
                case '\u2000':
                case '\u2001':
                case '\u2002':
                case '\u2003':
                case '\u2004':
                case '\u2005':
                case '\u2006':
                case '\u2007':
                case '\u2008':
                case '\u2009':
                case '\u200A':
                case '\u202F':
                case '\u205F':
                case '\u3000':
                case '\u2028':
                case '\u2029':
                case '\u0009':
                case '\u000A':
                case '\u000B':
                case '\u000C':
                case '\u000D':
                case '\u0085':
                    continue;
                default:
                    src[dstIdx++] = ch;
                    break;
            }
        }
        return new string(src, 0, dstIdx);
    }
}

Tests:

[TestFixture]
public class Test
{
    // Short input
    //private const string input = "123 123 \t 1adc \n 222";
    //private const string expected = "1231231adc222";

    // Long input
    private const string input = "123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222";
    private const string expected = "1231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc222";

    private const int iterations = 1000000;

    [Test]
    public void RemoveInPlaceCharArray()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveInPlaceCharArray(input);
        }

        stopwatch.Stop();
        Console.WriteLine("InPlaceCharArray: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveStringReader()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveStringReader(input);
        }

        stopwatch.Stop();
        Console.WriteLine("String reader: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveLinqNativeCharIsWhitespace()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveLinqNativeCharIsWhitespace(input);
        }

        stopwatch.Stop();
        Console.WriteLine("LINQ using native char.IsWhitespace: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveLinq()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveLinq(input);
        }

        stopwatch.Stop();
        Console.WriteLine("LINQ: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveRegex()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveRegex(input);
        }

        stopwatch.Stop();
        Console.WriteLine("Regex: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveRegexCompiled()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveRegexCompiled(input);
        }

        stopwatch.Stop();
        Console.WriteLine("RegexCompiled: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveForLoop()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveForLoop(input);
        }

        stopwatch.Stop();
        Console.WriteLine("ForLoop: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [TestMethod]
    public void StringSplitThenJoin()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.StringSplitThenJoin(input);
        }

        stopwatch.Stop();
        Console.WriteLine("StringSplitThenJoin: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }
}

Edit: Tested a nice one liner from Kernowcode.

answered Oct 04 '22 07:10

Stian Standahl

Just an alternative because it looks quite nice :) - NOTE: Henks answer is the quickest of these.

input.ToCharArray()
 .Where(c => !Char.IsWhiteSpace(c))
 .Select(c => c.ToString())
 .Aggregate((a, b) => a + b);

Testing 1,000,000 loops on "This is a simple Test"

This method = 1.74 seconds
Regex = 2.58 seconds
new String (Henks) = 0.82 seconds

answered Oct 04 '22 08:10

BlueChippy

I found a nice write-up on this on CodeProject by Felipe Machado (with help by Richard Robertson)

He tested ten different methods. This one is the fastest safe version...

public static string TrimAllWithInplaceCharArray(string str) {

    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;

    for (int i = 0; i < len; i++) {
        var ch = src[i];

        switch (ch) {

            case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':

            case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':

            case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':

            case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':

            case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                continue;

            default:
                src[dstIdx++] = ch;
                break;
        }
    }
    return new string(src, 0, dstIdx);
}

And the fastest unsafe version... (some inprovements by Sunsetquest 5/26/2021 )

public static unsafe void RemoveAllWhitespace(ref string str)
{
    fixed (char* pfixed = str)
    {
        char* dst = pfixed;
        for (char* p = pfixed; *p != 0; p++)
        {
            switch (*p)
            {
                case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
                case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
                case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
                case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
                case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                continue;

                default:
                    *dst++ = *p;
                    break;
            }
        }

        uint* pi = (uint*)pfixed;
        ulong len = ((ulong)dst - (ulong)pfixed) >> 1;
        pi[-1] = (uint)len;
        pfixed[len] = '\0';
    }
}

There are also some nice independent benchmarks on Stack Overflow by Stian Standahl that also show how Felipe's function is about 300% faster than the next fastest function. Also, for the one I modified, I used this trick.

answered Oct 04 '22 07:10

SunsetQuest

If you need superb performance, you should avoid LINQ and regular expressions in this case. I did some performance benchmarking, and it seems that if you want to strip white space from beginning and end of the string, string.Trim() is your ultimate function.

If you need to strip all white spaces from a string, the following method works fastest of all that has been posted here:

    public static string RemoveWhitespace(this string input)
    {
        int j = 0, inputlen = input.Length;
        char[] newarr = new char[inputlen];

        for (int i = 0; i < inputlen; ++i)
        {
            char tmp = input[i];

            if (!char.IsWhiteSpace(tmp))
            {
                newarr[j] = tmp;
                ++j;
            }
        }
        return new String(newarr, 0, j);
    }

answered Oct 04 '22 08:10

JHM

Regex is overkill; just use extension on string (thanks Henk). This is trivial and should have been part of the framework. Anyhow, here's my implementation:

public static partial class Extension
{
    public static string RemoveWhiteSpace(this string self)
    {
        return new string(self.Where(c => !Char.IsWhiteSpace(c)).ToArray());
    }
}

answered Oct 04 '22 08:10

Maksood

Here is a simple linear alternative to the RegEx solution. I am not sure which is faster; you'd have to benchmark it.

static string RemoveWhitespace(string input)
{
    StringBuilder output = new StringBuilder(input.Length);

    for (int index = 0; index < input.Length; index++)
    {
        if (!Char.IsWhiteSpace(input, index))
        {
            output.Append(input[index]);
        }
    }
    return output.ToString();
}

answered Oct 04 '22 08:10

Brandon Moretz

I needed to replace white space in a string with spaces, but not duplicate spaces. e.g., I needed to convert something like the following:

"a b   c\r\n d\t\t\t e"

"a b c d e"

I used the following method

private static string RemoveWhiteSpace(string value)
{
    if (value == null) { return null; }
    var sb = new StringBuilder();

    var lastCharWs = false;
    foreach (var c in value)
    {
        if (char.IsWhiteSpace(c))
        {
            if (lastCharWs) { continue; }
            sb.Append(' ');
            lastCharWs = true;
        }
        else
        {
            sb.Append(c);
            lastCharWs = false;
        }
    }
    return sb.ToString();
}

answered Oct 04 '22 08:10

user1325543

I think alot of persons come here for removing spaces. :

string s = "my string is nice";
s = s.replace(" ", "");

answered Oct 04 '22 09:10

larsemil

I assume your XML response looks like this:

var xml = @"<names>
                <name>
                    foo
                </name>
                <name>
                    bar
                </name>
            </names>";

The best way to process XML is to use an XML parser, such as LINQ to XML:

var doc = XDocument.Parse(xml);

var containsFoo = doc.Root
                     .Elements("name")
                     .Any(e => ((string)e).Trim() == "foo");

answered Oct 04 '22 08:10

dtb

We can use:

    public static string RemoveWhitespace(this string input)
    {
        if (input == null)
            return null;
        return new string(input.ToCharArray()
            .Where(c => !Char.IsWhiteSpace(c))
            .ToArray());
    }

answered Oct 04 '22 09:10

Tarik BENARAB

Using Linq, you can write a readable method this way :

    public static string RemoveAllWhitespaces(this string source)
    {
        return string.IsNullOrEmpty(source) ? source : new string(source.Where(x => !char.IsWhiteSpace(x)).ToArray());
    }

answered Oct 04 '22 07:10

Kewin Remy

Here is yet another variant:

public static string RemoveAllWhitespace(string aString)
{
  return String.Join(String.Empty, aString.Where(aChar => aChar !Char.IsWhiteSpace(aChar)));
}

As with most of the other solutions, I haven't performed exhaustive benchmark tests, but this works well enough for my purposes.

answered Oct 04 '22 07:10

Fred

Related questions
                            
                                Why can't I use the 'await' operator within the body of a lock statement?
                            
                                Missing XML comment for publicly visible type or member
                            
                                What is the difference between "x is null" and "x == null"?
                            
                                One DbContext per web request... why?
                            
                                Split List into Sublists with LINQ
                            
                                Passing a single item as IEnumerable<T>
                            
                                When should I use a List vs a LinkedList
                            
                                What are the default access modifiers in C#?
                            
                                Internal vs. Private Access Modifiers
                            
                                Get MIME type from filename extension
                            
                                Is Task.Result the same as .GetAwaiter.GetResult()?
                            
                                Define: What is a HashSet?
                            
                                Reflecting parameter name: abuse of C# lambda expressions or syntax brilliance? [closed]
                            
                                How to convert an Stream into a byte[] in C#? [duplicate]
                            
                                How to get C# Enum description from value? [duplicate]
                            
                                Enum "Inheritance"
                            
                                Why is there no ForEach extension method on IEnumerable?
                            
                                C# generics syntax for multiple type parameter constraints [duplicate]
                            
                                C# getting the path of %AppData%
                            
                                Storing WPF Image Resources

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficient way to remove ALL whitespace from String?

Tags:

string

c#

whitespace

removing-whitespace

Corey Ogburn

People also ask

16 Answers

slandau

Henk J Meulekamp

Mike_K

kernowcode

Stian Standahl

BlueChippy

SunsetQuest

JHM

Maksood

Brandon Moretz

user1325543

larsemil

dtb

Tarik BENARAB

Kewin Remy

Fred

Recent Activity

Donate For Us