I have many adresses like "East 19th Street" or "West 141st Street" and I would like to remove the "th" and the "st" in a single call to re.sub.
re.sub("(\d+)st|(\d+)nd|(\d+)rd|(\d+)th", "g<1>", "East 19th Street")
doesn't work because it is not always the first gorup which is caught
I could chain the subs but it is dirty. Help appreciated
Let's try this:
re.sub(r"(\d+)(st|nd|rd|th)\b", r"\1", str)
or better
re.sub(r"(?<=\d)(st|nd|rd|th)\b", '', str)
\b
prevents things like 21strange
from being replaced.
To replace only grammatically correct constructs, you can also try:
re.sub(r"(?<=1\d)th\b|(?<=1)st\b|(?<=2)nd\b|(?<=3)rd\b|(?<=[04-9])th\b", r'', str)
This replaces 23rd
and 44th
but leaves invalid things like 23st
intact. Don't know if this is worth the trouble though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With