Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for matching non-whitespace in Python

I want to use re.search to extract the first set of non-whitespace characters. I have the following pseudoscript that recreates my problem:

#!/usr/bin/env python2.7
import re

line = "STARC-1.1.1.5             ConsCase    WARNING    Warning"
m = re.search('^[^\S]*?',line)
if m:
    print m.group(0)

It seems to be printing the whitespace instead of STARC-1.1.1.5

So far as I understand it, this regular expression is saying: At the start of the line, find a set of nonwhitespace characters, don't be greedy

I was pretty sure this would work, the documentation says I can use /S to match whitespace in [], so i'm not sure where the issue is.

Now, I know, I know this probably looks weird, why aren't I using some other function to do this? Well, there's more than one way to skin a cat and i'm still getting the hang of regular expressions in Python so I'd like to know how I can use re.search to extract this field in this fashion.

like image 530
Oliver Farren Avatar asked Jan 05 '17 11:01

Oliver Farren


2 Answers

The [^\S] is a negated character class that is equal to \s (whitespace pattern). The *? is a lazy quantifier that matches zero or more characters, but as few as possible, and when used at the end of the pattern never actually matches any characters.

Replace you m = re.search('^[^\S]*?',line) line with

m = re.match(r'\S+',line)

or - if you want to also allow an empty string match:

m = re.match(r'\S*',line)

The re.match method anchors the pattern at the start of the string. With re.search, you need to keep the ^ anchor at the start of the pattern:

m = re.search(r'^\S+',line)

See the Python demo:

import re
line = "STARC-1.1.1.5             ConsCase    WARNING    Warning"
m = re.search('^\S+',line)
if m:
    print m.group(0)
# => STARC-1.1.1.5

However, here, in this case, you may just use a mere split():

res = line.split() 
print(res[0])

See another Python demo.

like image 77
Wiktor Stribiżew Avatar answered Oct 01 '22 19:10

Wiktor Stribiżew


\s matches a whitespace character.

\S matches a non-whitespace character.

[...] matches a character in the set ....

[^...] matches a character not in the set ....

[^\S] matches a character that is not a non-whitespace character, i.e. it matches a whitespace character.

like image 45
melpomene Avatar answered Oct 01 '22 19:10

melpomene