I wish to transform the following:
"some text http://one.two.three.source.com more text. more text more text http://source.com more text. more text http://one.source.com more text more text. more text http://one.two.source.com more text more text"
To this:
"some text http://one_two_three.target.com more text more text more text http://target.com more text more text http://one.target.com more text more text more text http://one_two.target.com more text more text"
I wish to transform '.' separating each subdomain to '_' in a large chunk of text, the problem is that I want to make it conditional to whether there are subdomains or not. I cannot predict the rest of the text and the transformation need to occur only for url patterns.
This is what I have so far:
src = 'source.com'
dst = 'target.com'
reMatch = r'http(?P<a>s?):(?P<b>\\?)/(?P<c>\\?)/(?P<d>([^.:/]+\.)?)(?P<e>([^.:/]+\.)?)(?P<f>([^.:/]+\.)?)' + src
p = re.compile(reMatch, re.IGNORECASE)
reReplace = r'http\g<a>:\g<b>/\g<c>/\g<d>\g<e>\g<f>' + dst
p.sub(reReplace, content)
It only replaces 'source.com' with 'target.com' and copies the subdomains (up to 3) but do not replace '.' with '_' between subdomains.
I built a function that achieves your desired output given your input:
def special_replace(s):
p=re.compile(r"(http://.*?)(\.?source\.com)")
spl=p.split(s)
newtext=[]
for text in spl:
if text.startswith("http://"):
text=text.replace(".","_")
elif text.endswith("source.com"):
text=text.replace("source.com", "target.com")
newtext.append(text)
return "".join(newtext)
It's not that elegant but it reaches your goal :).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With