Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string based on regex pattern

I have a message which I am trying to split.

import re

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."

split_message = re.split(r'[a-zA-Z]{3} (0[1-9]|[1-2][0-9]|3[0-1]), ([0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC', message)

print(split_message)

Expected Output:

["This is update 1", "This is update 2", "This is update 3"]

Actual Output:

['', '10', '17', "This is update 1", '10', '15',  "This is update 2", '10', '15', "This is update 3"]

Not sure what I am missing.

like image 395
PritamYaduvanshi Avatar asked Nov 19 '25 06:11

PritamYaduvanshi


1 Answers

You are using "capturing groups", this is why their content is also part of the result array. You'll want to use non capturing groups (beginning with ?:):

import re

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."

split_message = re.split(r"[a-zA-Z]{3} (?:0[1-9]|[1-2][0-9]|3[0-1]), (?:[0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC", message)

print(split_message)

You will however always get an empty entry first, because an empty string is in front of your first split pattern:

['', 'This is update 1.', 'This is update 2.', 'This is update 3.']

As statet in the docs:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

like image 132
Aram Becker Avatar answered Nov 21 '25 21:11

Aram Becker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!