I want to bring the first occurrence of a date or in general a regular expression to the beginning of my text:
Example:
"I went out on 1 sep 2012 and it was better than 15 jan 2012"
and I want to get
"1 sep 2012, I went out on and it was better than 15 jan 2012"
I was thinking about replacing "1 sep 2012"
with ",1 sep 2012,"
and then cutting the string from ","
but I don't know what to write instead of replace_with
:
line = re.sub(r'\d+\s(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\s\d{4}', 'replace_with', line, 1)
any help?
Use capture groups:
>>> import re
>>> s = "I went out on 1 sep 2012 and it was better than 15 jan 2012"
>>> r = re.compile('(^.*)(1 sep 2012 )(.*$)')
>>> r.sub(r'\2\1\3',s)
'1 sep 2012 I went out on and it was better than 15 jan 2012'
Brackets capture parts of the string:
(^.*) # Capture everything from the start of the string
(1 sep 2012 ) # Upto the part we are interested in (captured)
(.*$) # Capture everything else
Then just reorder the capture groups in the substitution `\2\1\3'
note: to reference the capture groups requires a raw string r'\2\1\3'
. The second group in my example is just the literal string (1 sep 2012 )
but of course this can be any regexp such as the one you created (with an extra \s
on the end):
(\d+\s(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\s\d{4}\s)
>>> r = re.compile(r'(^.*)(\d+\s(?:aug|sep|oct|nov)\s\d{4}\s)(.*$)')
>>> r.sub(r'\2\1\3',s)
'1 sep 2012 I went out on and it was better than 15 jan 2012'
From docs.python.org:
When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With