Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to remove white spaces in string

I'm trying to fetch multiple email addresses seperated by "," within string from database table, but it's also returning me whitespaces, and I want to remove the whitespace quickly.

The following code does remove whitespace, but it also becomes slow whenever I try to fetch large number email addresses in a string like to 30000, and then try to remove whitespace between them. It takes more than four to five minutes to remove those spaces.

 Regex Spaces =         new Regex(@"\s+", RegexOptions.Compiled); txtEmailID.Text = MultipleSpaces.Replace(emailaddress),""); 

Could anyone please tell me how can I remove the whitespace within a second even for large number of email address?

like image 829
Joe Avatar asked Mar 05 '11 11:03

Joe


People also ask

How do I get rid of extra white spaces in a string?

If you are just dealing with excess whitespace on the beginning or end of the string you can use trim() , ltrim() or rtrim() to remove it. If you are dealing with extra spaces within a string consider a preg_replace of multiple whitespaces " "* with a single whitespace " " .

How do I remove spaces between words in a string?

In Java, we can use regex \\s+ to match whitespace characters, and replaceAll("\\s+", " ") to replace them with a single space.

Which function is used to remove whitespace from the string?

The trim() function removes whitespace and other predefined characters from both sides of a string.


2 Answers

I would build a custom extension method using StringBuilder, like:

public static string ExceptChars(this string str, IEnumerable<char> toExclude) {     StringBuilder sb = new StringBuilder(str.Length);     for (int i = 0; i < str.Length; i++)     {         char c = str[i];         if (!toExclude.Contains(c))             sb.Append(c);     }     return sb.ToString(); } 

Usage:

var str = s.ExceptChars(new[] { ' ', '\t', '\n', '\r' }); 

or to be even faster:

var str = s.ExceptChars(new HashSet<char>(new[] { ' ', '\t', '\n', '\r' })); 

With the hashset version, a string of 11 millions of chars takes less than 700 ms (and I'm in debug mode)

EDIT :

Previous code is generic and allows to exclude any char, but if you want to remove just blanks in the fastest possible way you can use:

public static string ExceptBlanks(this string str) {     StringBuilder sb = new StringBuilder(str.Length);     for (int i = 0; i < str.Length; i++)     {         char c = str[i];         switch (c)         {             case '\r':             case '\n':             case '\t':             case ' ':                 continue;             default:                 sb.Append(c);                 break;         }     }     return sb.ToString(); } 

EDIT 2 :

as correctly pointed out in the comments, the correct way to remove all the blanks is using char.IsWhiteSpace method :

public static string ExceptBlanks(this string str) {     StringBuilder sb = new StringBuilder(str.Length);     for (int i = 0; i < str.Length; i++)     {         char c = str[i];         if(!char.IsWhiteSpace(c))             sb.Append(c);     }     return sb.ToString(); } 
like image 180
digEmAll Avatar answered Sep 28 '22 12:09

digEmAll


Given the implementation of string.Replaceis written in C++ and part of the CLR runtime I'm willing to bet

email.Replace(" ","").Replace("\t","").Replace("\n","").Replace("\r","");

will be the fastest implementation. If you need every type of whitespace, you can supply the hex value the of unicode equivalent.

like image 30
Chris S Avatar answered Sep 28 '22 11:09

Chris S