Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex match space only

Tags:

python

regex

In python3, how do I match exactly whitespace character and not newline \n or tab \t?

I've seen the \s+[^\n] answer from Regex match space not \n answer, but for the following example it does not work:

a='rasd\nsa sd' print(re.search(r'\s+[^ \n]',a)) 

Result is <_sre.SRE_Match object; span=(4, 6), match='\ns'>, which is the newline matched.

like image 807
Dimitry Avatar asked Jul 02 '16 16:07

Dimitry


People also ask

How do you match a space in regex python?

\s | Matches whitespace characters, which include the \t , \n , \r , and space characters. \S | Matches non-whitespace characters. \b | Matches the boundary (or empty string) at the start and end of a word, that is, between \w and \W .

How do you match a space in regex?

If you're looking for a space, that would be " " (one space). If you're looking for one or more, it's " *" (that's two spaces and an asterisk) or " +" (one space and a plus).

What is space in regex python?

The space character does not have any special meaning, it just means "match a space". RE = re. compile(' +') So for your case a='rasd\nsa sd' print(re.search(' +', a))

What is re Dotall in Python?

By using re. DOTALL flag, you can modify the behavior of dot (.) character to match the newline character apart from other characters.


2 Answers

No need for special groups. Just create a regex with a space character. The space character does not have any special meaning, it just means "match a space".

RE = re.compile(' +') 

So for your case

a='rasd\nsa sd' print(re.search(' +', a)) 

would give

<_sre.SRE_Match object; span=(7, 8), match=' '> 
like image 50
Resonance Avatar answered Sep 23 '22 08:09

Resonance


If you want to match 1 or more whitespace chars except the newline and a tab use

r"[^\S\n\t]+" 

The [^\S] matches any char that is not a non-whitespace = any char that is whitespace. However, since the character class is a negated one, when you add characters to it they are excluded from matching.

Python demo:

import re a='rasd\nsa sd' print(re.findall(r'[^\S\n\t]+',a)) # => [' '] 

Some more considerations: \s matches [ \t\n\r\f\v] if ASCII flag is used. So, if you plan to only match ASCII, you might as well use [ \r\f\v] to exclude the chars you want. If you need to work with Unicode strings, the solution above is a viable one.

like image 41
Wiktor Stribiżew Avatar answered Sep 25 '22 08:09

Wiktor Stribiżew