I have three strings which have information of the street name and apartment number.
"32 Syndicate street"
, "Street 45 No 100"
and "15, Tom and Jerry Street"
Here,
"32 Syndicate street" -> {"street name": "Syndicate street", "apartment number": "32"}
"Street 45 No 100" -> {"street name": "Street 45", "apartment number": "No 100"}
"15, Tom and Jerry Street" -> {"street name": "Tom and Jerry Street", "apartment number": "15"}
I am trying to use Python's regex to get the street names and apartment numbers separately. This is my current code, which is having problems:
import re
for i in ["32 Syndicate street","Street 45 No 100","15, Tom and Jerry Street"]:
###--- write patterns for street names
pattern_street = re.compile(r'([A-Za-z]+\s?\w+ | [A-Za-z]+\s?[A-Za-z]+\s?[A-Za-z]+\s? | [A-Za-z]+\s?)')
match_street = pattern_street.search(i)
###--- write patterns for apartment numbers
pattern_aptnum = re.compile(r'(^\d+\s? | [A-Za-z]+[\s?]+[0-9]+$)')
match_aptnum = pattern_aptnum.search(i)
fin_street = match_street[0] ##--> final street name
fin_aptnum = match_aptnum[0] ##--> final apartment name
print("street--",fin_street)
print("apartmentnumber--",fin_aptnum)
I get the following output:
street-- Syndicate street
apartmentnumber-- 32
street-- Street 45
apartmentnumber-- No 100
I have two problems:
street-- Syndicate street
and apartmentnumber-- No 100
You may get the apartment number using
^\d+|\bNo\s*\d+
See the regex demo. The ^\d+|\bNo\s*\d+
regex matches either one or more digits at the start of string, or No
, zero or more whitespaces and then one or more digits.
To capture the street information, you can use
^\d+,?\s*(.*)|^(.*?)\s+No\s*\d+
See this regex demo. Details:
^\d+,?\s*(.*)
- start of string, one or more digits, an optional comma, 0+ whitespaces and then any zero or more chars other than line break chars as many as possible captured into Group 1|
- or^(.*?)\s+No\s*\d+
- start of string, any zero or more chars other than line break chars as many as possible captured into Group 2, 1+ whitespaces, No
, 0+ whitespaces, and then 1+ digits.In Python, never compile regexps inside a for
loop, do it before. See the Python demo:
import re
pattern_aptnum = re.compile(r'^\d+|\bNo\s*\d+')
pattern_street = re.compile(r'^\d+,?\s*(.*)|^(.*?)\s+No\s*\d+')
for i in ["32 Syndicate street","Street 45 No 100","15, Tom and Jerry Street"]:
fin_street = ""
fin_aptnum = ""
print("String:", i)
match_street = pattern_street.search(i)
if match_street:
fin_street = match_street.group(1) or match_street.group(2)
match_aptnum = pattern_aptnum.search(i)
if match_aptnum:
fin_aptnum = match_aptnum.group()
print("street--",fin_street)
print("apartmentnumber--",fin_aptnum)
Output:
String: 32 Syndicate street
street-- Syndicate street
apartmentnumber-- 32
String: Street 45 No 100
street-- Street 45
apartmentnumber-- No 100
String: 15, Tom and Jerry Street
street-- Tom and Jerry Street
apartmentnumber-- 15
re.compile(... , re.X)
if you want to use freely white space in the regex.print()
inserts a space by default between its several arguments.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With