Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex if group-match then put-different-character

Tags:

python

regex

I wish to transform the following:

"some text http://one.two.three.source.com more text. more text more text http://source.com more text. more text http://one.source.com more text more text. more text http://one.two.source.com more text more text"

To this:

"some text http://one_two_three.target.com more text more text more text http://target.com more text more text http://one.target.com more text more text more text http://one_two.target.com more text more text"

I wish to transform '.' separating each subdomain to '_' in a large chunk of text, the problem is that I want to make it conditional to whether there are subdomains or not. I cannot predict the rest of the text and the transformation need to occur only for url patterns.

This is what I have so far:

src = 'source.com'
dst = 'target.com'
reMatch = r'http(?P<a>s?):(?P<b>\\?)/(?P<c>\\?)/(?P<d>([^.:/]+\.)?)(?P<e>([^.:/]+\.)?)(?P<f>([^.:/]+\.)?)' + src
p = re.compile(reMatch, re.IGNORECASE)
reReplace = r'http\g<a>:\g<b>/\g<c>/\g<d>\g<e>\g<f>' + dst
p.sub(reReplace, content)

It only replaces 'source.com' with 'target.com' and copies the subdomains (up to 3) but do not replace '.' with '_' between subdomains.

like image 306
Oded Golan Avatar asked May 29 '26 01:05

Oded Golan


1 Answers

I built a function that achieves your desired output given your input:

def special_replace(s):
    p=re.compile(r"(http://.*?)(\.?source\.com)")
    spl=p.split(s)
    newtext=[]
    for text in spl:
        if text.startswith("http://"):
            text=text.replace(".","_")
        elif text.endswith("source.com"):
            text=text.replace("source.com", "target.com")
        newtext.append(text)
    return "".join(newtext)

It's not that elegant but it reaches your goal :).

like image 119
halex Avatar answered May 30 '26 16:05

halex



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!