Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex compile and search strings with numbers and words

Tags:

I have three strings which have information of the street name and apartment number.

"32 Syndicate street", "Street 45 No 100" and "15, Tom and Jerry Street"

Here,

"32 Syndicate street" -> {"street name": "Syndicate street", "apartment number": "32"}
"Street 45 No 100" -> {"street name": "Street 45", "apartment number": "No 100"}
"15, Tom and Jerry Street" -> {"street name": "Tom and Jerry Street", "apartment number": "15"}

I am trying to use Python's regex to get the street names and apartment numbers separately. This is my current code, which is having problems:

import re 
for i in ["32 Syndicate street","Street 45 No 100","15, Tom and Jerry Street"]:
    ###--- write patterns for street names
    pattern_street = re.compile(r'([A-Za-z]+\s?\w+ | [A-Za-z]+\s?[A-Za-z]+\s?[A-Za-z]+\s? | [A-Za-z]+\s?)') 
    match_street = pattern_street.search(i) 
    
    ###--- write patterns for apartment numbers
    pattern_aptnum = re.compile(r'(^\d+\s? | [A-Za-z]+[\s?]+[0-9]+$)') 
    match_aptnum = pattern_aptnum.search(i)

    fin_street = match_street[0] ##--> final street name
    fin_aptnum = match_aptnum[0] ##--> final apartment name 

    print("street--",fin_street)
    print("apartmentnumber--",fin_aptnum)

I get the following output:

street--  Syndicate street 
apartmentnumber-- 32 
street-- Street 45 
apartmentnumber--  No 100

I have two problems:

  1. I am not able to get the apartment number "15" for the final string.
  2. Why is there a space in the beginning of street-- Syndicate street and apartmentnumber-- No 100
like image 768
Srivatsan Avatar asked Aug 29 '20 16:08

Srivatsan


2 Answers

You may get the apartment number using

^\d+|\bNo\s*\d+

See the regex demo. The ^\d+|\bNo\s*\d+ regex matches either one or more digits at the start of string, or No, zero or more whitespaces and then one or more digits.

To capture the street information, you can use

^\d+,?\s*(.*)|^(.*?)\s+No\s*\d+

See this regex demo. Details:

  • ^\d+,?\s*(.*) - start of string, one or more digits, an optional comma, 0+ whitespaces and then any zero or more chars other than line break chars as many as possible captured into Group 1
  • | - or
  • ^(.*?)\s+No\s*\d+ - start of string, any zero or more chars other than line break chars as many as possible captured into Group 2, 1+ whitespaces, No, 0+ whitespaces, and then 1+ digits.

In Python, never compile regexps inside a for loop, do it before. See the Python demo:

import re 

pattern_aptnum = re.compile(r'^\d+|\bNo\s*\d+')
pattern_street = re.compile(r'^\d+,?\s*(.*)|^(.*?)\s+No\s*\d+') 
for i in ["32 Syndicate street","Street 45 No 100","15, Tom and Jerry Street"]:
    fin_street = ""
    fin_aptnum = ""
    print("String:", i)
    match_street = pattern_street.search(i)
    if match_street:
        fin_street = match_street.group(1) or match_street.group(2)
    match_aptnum = pattern_aptnum.search(i)
    if match_aptnum:
        fin_aptnum = match_aptnum.group()

    print("street--",fin_street)
    print("apartmentnumber--",fin_aptnum)

Output:

String: 32 Syndicate street
street-- Syndicate street
apartmentnumber-- 32
String: Street 45 No 100
street-- Street 45
apartmentnumber-- No 100
String: 15, Tom and Jerry Street
street-- Tom and Jerry Street
apartmentnumber-- 15
like image 162
Wiktor Stribiżew Avatar answered Sep 30 '22 21:09

Wiktor Stribiżew


  1. Use re.compile(... , re.X) if you want to use freely white space in the regex.
  2. print() inserts a space by default between its several arguments.
like image 24
Gribouillis Avatar answered Sep 30 '22 21:09

Gribouillis