C#/.NET 2.0
I need to parse a string containing the street name and the house no in two separate values.
in: "Streetname 1a" out: "streetname" "1a"
"Street name 1a" "street name" "1a"
"Street name 1 a" "street name" "1 a"
My first choice was to split the string where I found a " " char but that will not work for the second case.
result[0] = trimmedInput.Substring(0, splitPosition).Trim();
result[1] = trimmedInput.Substring(splitPosition + 1).Trim();
What is the best way to do this? Can I use regular expressions?
Thanks
^(.+)\s(\S+)$
should do the trick
EDIT: this will work is the house number can't have spaces in it. Otherwise this problem can't be solved programmatically since the program will never know the semantics of string tokens.
House addresses are messy and inconsistent. I worked with address data and honestly, if you don't have the data in normalized form, you're basically screwed.
^(.+)\s(\d+(\s*[^\d\s]+)*)$
will cover some more cases, but pattern like that is a can of worms if I ever saw one.
You have to more clearly define the pattern you're looking for, assuming there even is one. There needs to be some general observations you can make that will always hold:
From a comment, the last point isn't strictly true because the number & letter portion of the street number can be separated by whitespace.
If you can't guarantee the order of the street name & number, and also that the words in the street name do not contain numbers, then I'm not really sure that anything is going to help you.
The following regex should cover most cases:
Regex reggie = new Regex(@"^(?<name>\w[\s\w]+?)\s*(?<num>\d+\s*[a-z]?)$", RegexOptions.IgnoreCase)
As Dyppl stated, street addresses are messy. But, if your address data represents US addresses and you have the complete address (including city, state, and/or ZIP Code) you could use an address verification service to parse (and verify!) and standardize the components. I work for SmartyStreets, an address verification provider. Here's a quick C# example I wrote a while back that calls our LiveAddress API:
https://github.com/smartystreets/LiveAddressSamples/blob/master/c-sharp/street-address.cs
Here's the resulting output for that example (notice that the street name and primary number are parsed in the "components" section):
[
{
"input_index": 0,
"candidate_index": 0,
"delivery_line_1": "3214 N University Ave",
"last_line": "Provo UT 84604-4405",
"delivery_point_barcode": "846044405140",
"components": {
"primary_number": "3214",
"street_predirection": "N",
"street_name": "University",
"street_suffix": "Ave",
"city_name": "Provo",
"state_abbreviation": "UT",
"zipcode": "84604",
"plus4_code": "4405",
"delivery_point": "14",
"delivery_point_check_digit": "0"
},
"metadata": {
"record_type": "S",
"county_fips": "49049",
"county_name": "Utah",
"carrier_route": "C016",
"congressional_district": "03",
"latitude": 40.27586,
"longitude": -111.6576,
"precision": "Zip9"
},
"analysis": {
"dpv_match_code": "Y",
"dpv_footnotes": "AABBR1",
"dpv_cmra": "Y",
"dpv_vacant": "N",
"ews_match": false
}
}
]
We provide an absolutely free subscription for low-usage users. Here's a link that explains all the fields:
http://wiki.smartystreets.com/liveaddress_api_users_guide#json-responses
EDIT: included latitude/longitude fields (newly released).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With