Regex to Match Horizontal White Spaces

Tags:

I need a regex in Python2 to match only horizontal white spaces not newlines.

\s matches all whitespaces including newlines.

>>> re.sub(r"\s", "", "line 1.\nline 2\n")
'line1.line2'

\h does not work at all.

>>> re.sub(r"\h", "", "line 1.\nline 2\n")
'line 1.\nline 2\n'

[\t ] works but I am not sure if I am missing other possible white space characters especially in Unicode. Such as \u00A0 (non breaking space) or \u200A (hair space). There are much more white space characters at the following link: https://www.cs.tut.fi/~jkorpela/chars/spaces.html (dead link)

>>> re.sub(r"[\t ]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)
u'line1.\nline2\n\xa0\u200a\n'

Do you have any suggestions?

735

asked Sep 07 '17 12:09

Memduh

2 Answers

I ended up using [^\S\n] instead of specifying all Unicode white spaces.

>>> re.sub(r"[^\S\n]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)
u'line1.\nline2\n\n'

>>> re.sub(r"[\t ]", "", u"line 1.\nline 2\n\u00A0\u200A\n", flags=re.UNICODE)
u'line1.\nline2\n\xa0\u200a\n'

It works as expected.

answered Oct 13 '22 09:10

Memduh

If you only want to match actual spaces, try a plain ( )+ (brackets for readability only*). If you want to match spaces and tabs, try [ \t]+ (+ so that you also match a sequence of e.g. 3 space characters.

Now there are in fact other whitespace characters in unicode, that's true. You are, however, highly unlikely to encounter any of those in written code, and also pretty unlikely to encounter any of the less common whitespace chars in other texts.

If you want to, you can include \u00A0 (non-breaking space, fairly common in scientific papers and on some websites. This is the HTML  ), en-space \u2002 (&ensp;), em-space \u2003 (&emsp;) or thin space \u2009 ( ).

You can find a variety of other unicode whitespace characters on Wikipedia, but I highly doubt it's necessary to include them. I'd just stick to space, tab and maybe non-breaking space (i.e. [ \t\u00A0]+).

What do you intend to match with \h, anyway? It's not a valid "symbol" in regex, as far as I know.

*Stackoverflow doesn't display spaces on the edge of inline code

answered Oct 13 '22 10:10

PixelMaster

Related questions
                            
                                Removing data between double squiggly brackets with nested sub brackets in python
                            
                                How to replace part of a string using regex
                            
                                RedirectMatch how to match any words but not index and nothing
                            
                                Is there a MySQL equivalent of PHP's preg_replace?
                            
                                How to replace text URLs and exclude URLs in HTML tags?
                            
                                Why does Ruby /[[:punct:]]/ miss some punctuation characters?
                            
                                form validation allow only english alphabet characters
                            
                                Regular Expression in R with a negative lookbehind
                            
                                Remove seconds from toLocaleTimeString
                            
                                replace a part of a string with REGEXP in sqlite3
                            
                                What's the difference between $/ and $¢ in regex?
                            
                                Regex to match sloppy fractions / mixed numbers
                            
                                .htaccess RewriteCond where URI does not contain domain
                            
                                How to use regex in AngularJS $httpBackend ExpectGET
                            
                                What is the grep equivalent in Python?
                            
                                Create a program that inputs a regular expression and outputs strings that satisfy that regular expression
                            
                                Perl: Why doesn't eval '/(...)/' set $1?
                            
                                Regex for matching a previous group in the pattern?
                            
                                list.files - exclude folder
                            
                                Regex for Password: "Atleast 1 letter, 1 number, 1 special character and SHOULD NOT start with a special character"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex to Match Horizontal White Spaces

Tags:

regex

unicode

python-unicode

python-2.7

Memduh

People also ask

2 Answers

Memduh

PixelMaster

Recent Activity

Donate For Us