Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex Match Before Character AND Ignore White Space

Tags:

python

regex

I'm trying to write a regex to match part of a string that comes before '/' but also ignores any leading or trailing white space within the match.

So far I've got ^[^\/]* which matches everything before the '/' but I can't figure out how to ignore the white space.

      123 / some text 123

should yield

123

and

     a test / some text 123

should yield

a test
like image 448
harryk Avatar asked May 17 '19 20:05

harryk


People also ask

How do you skip a space in regex?

You can stick optional whitespace characters \s* in between every other character in your regex.

Does regex ignore whitespace?

regex ignore spacesTrim whitespaces around string, but not inside of string.

Which modifier ignores white space in regex?

With flavors that support mode modifiers, you can put (? x) the very start of the regex to make the remainder of the regex free-spacing. In free-spacing mode, whitespace between regular expression tokens is ignored.

Which regex matches only a whitespace character in Python?

\s | Matches whitespace characters, which include the \t , \n , \r , and space characters.


4 Answers

That's a little bit tricky. You first start matching from a non-whitespace character then continue matching slowly but surely up to the position that is immediately followed by an optional number of spaces and a slash mark:

\S.*?(?= *\/)

See live demo here

If slash mark could be the first non-whitespace character in input string then replace \S with [^\s\/]:

[^\s\/].*?(?= *\/)
like image 127
revo Avatar answered Oct 24 '22 19:10

revo


This expression is what you might want to explore:

^(.*?)(\s+\/.*)$

Here, we have two capturing groups where the first one collects your desired output, and the second one is your undesired pattern, bounded by start and end chars, just to be safe that can be removed if you want:

(.*?)(\s+\/.*)

Python Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"^(.*?)(\s+\/.*)$"

test_str = ("123 / some text 123\n"
    "anything else    / some text 123")

subst = "\\1"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

JavaScript Demo

const regex = /^(.*?)(\s+\/.*)$/gm;
const str = `123 / some text 123
anything else    / some text 123`;
const subst = `\n$1`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

RegEx

If this wasn't your desired expression, you can modify/change your expressions in regex101.com.

enter image description here

RegEx Circuit

You can also visualize your expressions in jex.im:

enter image description here

Spaces

For spaces before your desired output, we can simply add a capturing group with negative lookbehind:

 ^(\s+)?(.*?)(\s+\/.*)$

JavaScript Demo

const regex = /^(\s+)?(.*?)(\s+\/.*)$/gm;
const str = `      123 / some text 123
             anything else    / some text 123
123 / some text 123
anything else    / some text 123`;
const subst = `$2`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

Demo

enter image description here

like image 26
Emma Avatar answered Oct 24 '22 19:10

Emma


Here is a possible solution

Regex

(?<!\/)\S.*\S(?=\s*\/)

Example

# import regex # or re

string = ' 123 / some text 123'
test = regex.search(r'(?<!\/)\S.*\S(?=\s*\/)', string)
print(test.group(0))
# prints '123'

string = 'a test / some text 123'
test = regex.search(r'(?<!\/)\S.*\S(?=\s*\/)', string)
print(test.group(0))
# prints 'a test'

Short explanation

  • (?<!\/) says before a possible match there can be no / symbol.
  • \S.*\S matches lazily anything (.*) while making sure it does not start or end with a white space (\S)
  • (?=\s*\/) means a possible match must be followed by a / symbol or by white spaces + a /.
like image 24
user101 Avatar answered Oct 24 '22 18:10

user101


You could do it without a regex

my_string = "      123 / some text 123"
match = my_string.split("/")[0].strip()
like image 31
Boris Avatar answered Oct 24 '22 18:10

Boris