Split string based on regex pattern

Question

I have a message which I am trying to split.

import re

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."

split_message = re.split(r'[a-zA-Z]{3} (0[1-9]|[1-2][0-9]|3[0-1]), ([0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC', message)

print(split_message)

Expected Output:

["This is update 1", "This is update 2", "This is update 3"]

Actual Output:

['', '10', '17', "This is update 1", '10', '15',  "This is update 2", '10', '15', "This is update 3"]

Not sure what I am missing.

Aram Becker · Accepted Answer

You are using "capturing groups", this is why their content is also part of the result array. You'll want to use non capturing groups (beginning with ?:):

import re

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."

split_message = re.split(r"[a-zA-Z]{3} (?:0[1-9]|[1-2][0-9]|3[0-1]), (?:[0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC", message)

print(split_message)

You will however always get an empty entry first, because an empty string is in front of your first split pattern:

['', 'This is update 1.', 'This is update 2.', 'This is update 3.']

As statet in the docs:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

Split string based on regex pattern

Tags:

python

regex

split

PritamYaduvanshi

1 Answers

Aram Becker

Recent Activity

Donate For Us

Split string based on regex pattern

Tags:

python

regex

split

PritamYaduvanshi

1 Answers

Aram Becker

Related questions

Recent Activity

Donate For Us