Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Remove Excess Whitespace in String Using Regex

Tags:

regex

perl

I have a series of paragraphs that I want to parse using regular expressions, but unfortunately, the paragraph is appearing with many white spaces in between sentences, and sometimes words. I would like to be able to remove all excess white space, but I'm unsure how... Anyone have any ideas? I don't want to remove all whitespace, which is the only thing I've found so far, but to keep regular paragraph format, as in after every word have a white space, and after every punctuation+word have a whitespace. I am coding in Perl.

Any help would be appreciated!

like image 873
Sheldon Avatar asked Jan 31 '11 00:01

Sheldon


People also ask

How do I get rid of white space in regex?

You can easily trim unnecessary whitespace from the start and the end of a string or the lines in a text file by doing a regex search-and-replace. Search for ^[ \t]+ and replace with nothing to delete leading whitespace (spaces and tabs). Search for [ \t]+$ to trim trailing whitespace.

How do I remove extra spaces from a string?

If you are just dealing with excess whitespace on the beginning or end of the string you can use trim() , ltrim() or rtrim() to remove it. If you are dealing with extra spaces within a string consider a preg_replace of multiple whitespaces " "* with a single whitespace " " .

Which regex would you use to remove all whitespace from string?

replaceAll() First, let's remove all whitespace from a string using the replaceAll() method. replaceAll() works with regular expressions (regex). We can use the regex character class '\s' to match a whitespace character.

How do I remove double spacing in regex?

The metacharacter “\s” matches spaces and + indicates the occurrence of the spaces one or more times, therefore, the regular expression \S+ matches all the space characters (single or multiple). Therefore, to replace multiple spaces with a single space.


1 Answers

Canonicalize horizontal whitespace:

s/\h+/ /g;

Canonicalize vertical whitespace:

s/\v+/\n/g;

Canonicalize all whitespace:

s/[\h\v]+/ /g;
like image 170
tchrist Avatar answered Nov 15 '22 06:11

tchrist