Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: Parse streetname/number

C#/.NET 2.0

I need to parse a string containing the street name and the house no in two separate values.

in: "Streetname 1a"         out:  "streetname"  "1a"
    "Street name 1a"              "street name" "1a"
    "Street name 1 a"             "street name" "1 a"

My first choice was to split the string where I found a " " char but that will not work for the second case.

result[0] = trimmedInput.Substring(0, splitPosition).Trim();
result[1] = trimmedInput.Substring(splitPosition + 1).Trim();

What is the best way to do this? Can I use regular expressions?

Thanks

like image 696
thedev Avatar asked Feb 16 '11 09:02

thedev


3 Answers

^(.+)\s(\S+)$ should do the trick

EDIT: this will work is the house number can't have spaces in it. Otherwise this problem can't be solved programmatically since the program will never know the semantics of string tokens.

House addresses are messy and inconsistent. I worked with address data and honestly, if you don't have the data in normalized form, you're basically screwed.

^(.+)\s(\d+(\s*[^\d\s]+)*)$ will cover some more cases, but pattern like that is a can of worms if I ever saw one.

like image 123
Dyppl Avatar answered Nov 02 '22 03:11

Dyppl


You have to more clearly define the pattern you're looking for, assuming there even is one. There needs to be some general observations you can make that will always hold:

  • A street address consists of a name and a number.
  • The name always appears first.
  • The name consists of one or more words, separated by spaces.
  • The number is a number followed by an optional letter.

From a comment, the last point isn't strictly true because the number & letter portion of the street number can be separated by whitespace.

If you can't guarantee the order of the street name & number, and also that the words in the street name do not contain numbers, then I'm not really sure that anything is going to help you.

The following regex should cover most cases:

Regex reggie = new Regex(@"^(?<name>\w[\s\w]+?)\s*(?<num>\d+\s*[a-z]?)$", RegexOptions.IgnoreCase)
like image 2
Quick Joe Smith Avatar answered Nov 02 '22 04:11

Quick Joe Smith


As Dyppl stated, street addresses are messy. But, if your address data represents US addresses and you have the complete address (including city, state, and/or ZIP Code) you could use an address verification service to parse (and verify!) and standardize the components. I work for SmartyStreets, an address verification provider. Here's a quick C# example I wrote a while back that calls our LiveAddress API:

https://github.com/smartystreets/LiveAddressSamples/blob/master/c-sharp/street-address.cs

Here's the resulting output for that example (notice that the street name and primary number are parsed in the "components" section):

[
    {
        "input_index": 0,
        "candidate_index": 0,
        "delivery_line_1": "3214 N University Ave",
        "last_line": "Provo UT 84604-4405",
        "delivery_point_barcode": "846044405140",
        "components": {
            "primary_number": "3214",
            "street_predirection": "N",
            "street_name": "University",
            "street_suffix": "Ave",
            "city_name": "Provo",
            "state_abbreviation": "UT",
            "zipcode": "84604",
            "plus4_code": "4405",
            "delivery_point": "14",
            "delivery_point_check_digit": "0"
        },
        "metadata": {
            "record_type": "S",
            "county_fips": "49049",
            "county_name": "Utah",
            "carrier_route": "C016",
            "congressional_district": "03",
            "latitude": 40.27586,
            "longitude": -111.6576,
            "precision": "Zip9"
        },
        "analysis": {
            "dpv_match_code": "Y",
            "dpv_footnotes": "AABBR1",
            "dpv_cmra": "Y",
            "dpv_vacant": "N",
            "ews_match": false
        }
    }
]

We provide an absolutely free subscription for low-usage users. Here's a link that explains all the fields:

http://wiki.smartystreets.com/liveaddress_api_users_guide#json-responses

EDIT: included latitude/longitude fields (newly released).

like image 2
mdwhatcott Avatar answered Nov 02 '22 03:11

mdwhatcott