I convert a HTML code to plain text.But there are many extra returns and spaces.How to remove them?
Use JavaScript's string. replace() method with a regular expression to remove extra spaces. The dedicated RegEx to match any whitespace character is \s .
You can easily trim unnecessary whitespace from the start and the end of a string or the lines in a text file by doing a regex search-and-replace. Search for ^[ \t]+ and replace with nothing to delete leading whitespace (spaces and tabs). Search for [ \t]+$ to trim trailing whitespace.
Spaces can be found simply by putting a space character in your regex. Whitespace can be found with \s . If you want to find whitespace between words, use the \b word boundary marker.
string new_string = Regex.Replace(orig_string, @"\s", "")
will remove all whitespace
string new_string = Regex.Replace(orig_string, @"\s+", " ")
will just collapse multiple whitespaces into one
I'm assuming that you want to
If that's correct, then you could use
resultString = Regex.Replace(subjectString, @"( |\r?\n)\1+", "$1");
This keeps the original "type" of whitespace intact and also preserves Windows line endings correctly. If you also want to "condense" multiple tabs into one, use
resultString = Regex.Replace(subjectString, @"( |\t|\r?\n)\1+", "$1");
To condense a string of newlines and spaces (any number of each) into a single newline, use
resultString = Regex.Replace(subjectString, @"(?:(?:\r?\n)+ +){2,}", @"\n");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With