Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex python wont work as I want it

Tags:

python

regex

I'm trying to filter street names and get the parts that I want. The names come in several formats. Here are some examples and what I want from them.

Car Cycle 5 B Ap 1233       < what I have
Car Cycle 5 B               < what I want

Potato street 13 1 AB       < what I have
Potato street 13            < what I want

Chrome Safari 41 Ap 765     < what I have
Chrome Safari 41            < what I want

Highstreet 53 Ap 2632/BH    < what I have
Highstreet 53               < what I want

Something street 91/Daniel  < what I have
Something street 91           < what I want

Usually what I want is the street name (1-4 names) followed by the street number if there is one and then the street letter (1 letter) if there is one. I just can't get it to work right.

Here is my code (I know, it sucks):

import re

def address_regex(address):
    regex1 = re.compile("(\w+ ){1,4}(\d{1,4} ){1}(\w{1} )")
    regex2 = re.compile("(\w+ ){1,4}(\d{1,4} ){1}")
    regex3 = re.compile("(\w+ ){1,4}(\d){1,4}")
    regex4 = re.compile("(\w+ ){1,4}(\w+)")

    s1 = regex1.search(text)
    s2 = regex2.search(text)
    s3 = regex3.search(text)
    s4 = regex4.search(text)

    regex_address = ""

    if s1 != None:
        regex_address = s1.group()
    elif s2 != None:
        regex_address = s2.group()
    elif s3 != None:
        regex_address = s3.group()
    elif s4 != None:
        regex_address = s4.group()    
    else:
        regex_address = address

    return regex_address

I'm using Python 3.4

like image 611
ZeZe Avatar asked Nov 26 '25 20:11

ZeZe


1 Answers

I'm going to go out on a limb here and assume in your last example you actually want to catch the number 91, because it makes no sense not to.

Here's a solution which catches all your examples (and your last, but including the 91):

^([\p{L} ]+ \d{1,4}(?: ?[A-Za-z])?\b)
  • ^ Start match at beginning of string
  • [\p{L} ]+ Character class of space or unicode character belonging to the "letter" category, 1-infinity times
  • \d{1,4} Number, 1-4 times
  • (?: ?[A-Za-z])? Non-capture group of optional space and a single letter, 0-1 times

Capture group 1 is the entire address. I didn't quite understand the logic behind your grouping, but feel free to group it however you prefer.

See demo

like image 144
ohaal Avatar answered Nov 29 '25 08:11

ohaal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!