Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regex: replacing <number>st, <number>nd, <number>th etc in a adress with a single sub

Tags:

python

regex

I have many adresses like "East 19th Street" or "West 141st Street" and I would like to remove the "th" and the "st" in a single call to re.sub.

re.sub("(\d+)st|(\d+)nd|(\d+)rd|(\d+)th", "g<1>", "East 19th Street")

doesn't work because it is not always the first gorup which is caught

I could chain the subs but it is dirty. Help appreciated

like image 761
Mermoz Avatar asked Jan 23 '13 11:01

Mermoz


1 Answers

Let's try this:

re.sub(r"(\d+)(st|nd|rd|th)\b", r"\1", str)

or better

re.sub(r"(?<=\d)(st|nd|rd|th)\b", '', str)

\b prevents things like 21strange from being replaced.

To replace only grammatically correct constructs, you can also try:

re.sub(r"(?<=1\d)th\b|(?<=1)st\b|(?<=2)nd\b|(?<=3)rd\b|(?<=[04-9])th\b", r'', str)

This replaces 23rd and 44th but leaves invalid things like 23st intact. Don't know if this is worth the trouble though.

like image 152
georg Avatar answered Nov 14 '22 21:11

georg