Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching a space in regex

Tags:

regex

php

People also ask

How do you match a space in regex?

The most common regex character to find whitespaces are \s and \s+ . The difference between these regex characters is that \s represents a single whitespace character while \s+ represents multiple whitespaces in a string.

How do you define space in regex?

The most common forms of whitespace you will use with regular expressions are the space (␣), the tab (\t), the new line (\n) and the carriage return (\r) (useful in Windows environments), and these special characters match each of their respective whitespaces.

Is space a special character in regex?

Regex uses backslash ( \ ) for two purposes: for metacharacters such as \d (digit), \D (non-digit), \s (space), \S (non-space), \w (word), \W (non-word). to escape special regex characters, e.g., \. for . , \+ for + , \* for * , \? for ? .

What is regex for space in Java?

Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.


If you're looking for a space, that would be " " (one space).

If you're looking for one or more, it's " *" (that's two spaces and an asterisk) or " +" (one space and a plus).

If you're looking for common spacing, use "[ X]" or "[ X][ X]*" or "[ X]+" where X is the physical tab character (and each is preceded by a single space in all those examples).

These will work in every* regex engine I've ever seen (some of which don't even have the one-or-more "+" character, ugh).

If you know you'll be using one of the more modern regex engines, "\s" and its variations are the way to go. In addition, I believe word boundaries match start and end of lines as well, important when you're looking for words that may appear without preceding or following spaces.

For PHP specifically, this page may help.

From your edit, it appears you want to remove all non valid characters The start of this is (note the space inside the regex):

$newtag = preg_replace ("/[^a-zA-Z0-9 ]/", "", $tag);
#                                    ^ space here

If you also want trickery to ensure there's only one space between each word and none at the start or end, that's a little more complicated (and probably another question) but the basic idea would be:

$newtag = preg_replace ("/ +/", " ", $tag); # convert all multispaces to space
$newtag = preg_replace ("/^ /", "", $tag);  # remove space from start
$newtag = preg_replace ("/ $/", "", $tag);  # and end

Cheat Sheet

Here is a small cheat sheet of everything you need to know about whitespace in regular expressions:

[[:blank:]]

Space or tab only, not newline characters. It is the same as writing [ \t].

[[:space:]] & \s

[[:space:]] and \s are the same. They will both match any whitespace character spaces, newlines, tabs, etc...

\v

Matches vertical Unicode whitespace.

\h

Matches horizontal whitespace, including Unicode characters. It will also match spaces, tabs, non-breaking/mathematical/ideographic spaces.

x (eXtended flag)

Ignore all whitespace. Keep in mind that this is a flag, so you will add it to the end of the regex like /hello/gmx. This flag will ignore whitespace in your regular expression.

For example, if you write an expression like /hello world/x, it will match helloworld, but not hello world. The extended flag also allows comments in your regex.

Example

/helloworld #hello this is a comment/

If you need to use a space, you can use \ to match spaces.


To match exactly the space character, you can use the octal value \040 (Unicode characters displayed as octal) or the hexadecimal value \x20 (Unicode characters displayed as hex).

Here is the regex syntax reference: https://www.regular-expressions.info/nonprint.html.


In Perl the switch is \s (whitespace).