I have three strings which have information of the street name and apartment number.
"32 Syndicate street", "Street 45 No 100" and "15, Tom and Jerry Street"
Here,
"32 Syndicate street" -> {"street name": "Syndicate street", "apartment number": "32"}
"Street 45 No 100" -> {"street name": "Street 45", "apartment number": "No 100"}
"15, Tom and Jerry Street" -> {"street name": "Tom and Jerry Street", "apartment number": "15"}
I am trying to use Python's regex to get the street names and apartment numbers separately. This is my current code, which is having problems:
import re 
for i in ["32 Syndicate street","Street 45 No 100","15, Tom and Jerry Street"]:
    ###--- write patterns for street names
    pattern_street = re.compile(r'([A-Za-z]+\s?\w+ | [A-Za-z]+\s?[A-Za-z]+\s?[A-Za-z]+\s? | [A-Za-z]+\s?)') 
    match_street = pattern_street.search(i) 
    
    ###--- write patterns for apartment numbers
    pattern_aptnum = re.compile(r'(^\d+\s? | [A-Za-z]+[\s?]+[0-9]+$)') 
    match_aptnum = pattern_aptnum.search(i)
    fin_street = match_street[0] ##--> final street name
    fin_aptnum = match_aptnum[0] ##--> final apartment name 
    print("street--",fin_street)
    print("apartmentnumber--",fin_aptnum)
I get the following output:
street--  Syndicate street 
apartmentnumber-- 32 
street-- Street 45 
apartmentnumber--  No 100
I have two problems:
street--  Syndicate street and apartmentnumber--  No 100
You may get the apartment number using
^\d+|\bNo\s*\d+
See the regex demo. The ^\d+|\bNo\s*\d+ regex matches either one or more digits at the start of string, or No, zero or more whitespaces and then one or more digits.
To capture the street information, you can use
^\d+,?\s*(.*)|^(.*?)\s+No\s*\d+
See this regex demo. Details:
^\d+,?\s*(.*) - start of string, one or more digits, an optional comma, 0+ whitespaces and then any zero or more chars other than line break chars as many as possible captured into Group 1| - or^(.*?)\s+No\s*\d+ - start of string, any zero or more chars other than line break chars as many as possible captured into Group 2, 1+ whitespaces, No, 0+ whitespaces, and then 1+ digits.In Python, never compile regexps inside a for loop, do it before. See the Python demo:
import re 
pattern_aptnum = re.compile(r'^\d+|\bNo\s*\d+')
pattern_street = re.compile(r'^\d+,?\s*(.*)|^(.*?)\s+No\s*\d+') 
for i in ["32 Syndicate street","Street 45 No 100","15, Tom and Jerry Street"]:
    fin_street = ""
    fin_aptnum = ""
    print("String:", i)
    match_street = pattern_street.search(i)
    if match_street:
        fin_street = match_street.group(1) or match_street.group(2)
    match_aptnum = pattern_aptnum.search(i)
    if match_aptnum:
        fin_aptnum = match_aptnum.group()
    print("street--",fin_street)
    print("apartmentnumber--",fin_aptnum)
Output:
String: 32 Syndicate street
street-- Syndicate street
apartmentnumber-- 32
String: Street 45 No 100
street-- Street 45
apartmentnumber-- No 100
String: 15, Tom and Jerry Street
street-- Tom and Jerry Street
apartmentnumber-- 15
                        re.compile(... , re.X) if you want to use freely white space in the regex.print() inserts a space by default between its several arguments.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With