Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for 3 types of address

Tags:

c#

regex

I have 3 different patterns of address:

Avenue T, 55 - Sumiton, AL - USA
Avenue T - Sumiton, AL - USA
Sumiton, AL - USA

which means: [address][,][number][-][county][,][state][-][country]

I'm trying to use this regex but not working correctly:

(?<street>.*\,)(?:\s*(?<number>[1-9][0-9]*))?\s*(?<county>.*\,)?\s*(?<State>.*\-)?\s*(?<Country>.*)

Regex tester

any help please? ty

like image 901
aasp Avatar asked Oct 09 '19 12:10

aasp


1 Answers

Your three patterns can be described like

${street}, ${number} - ${county}, ${State} - ${Country}
OPTIONAL   OPTIONAL   OBLIGATORY  OBLIGATORY OBLIGATORY

You may use

^(?!$)(?<street>.*?(?=(?:,\s*\d+)?\s*-\s*\w+,))?(?:,\s*(?<number>[1-9][0-9]*))?\s*(?:-\s*)?(?<county>[\w\s]+),\s*(?<State>[A-Z]{2})\s*-\s*(?<Country>.*)$

If you need to extract them from the a multiline string use

(?m)^(?!\r?$)(?<street>.*?(?=(?:,\s*\d+)?\s*-\s*\w+,))?(?:,\s*(?<number>[1-9][0-9]*))?\s*(?:-\s*)?(?<county>[\w\s]+),\s*(?<State>[A-Z]{2})\s*-\s*(?<Country>.*)\r?$

See the regex demo. Results:

enter image description here

So, the street group is only populated if there is an optional number group with an obligtory county group following.

Details

  • ^ - start of string/line (if (?m) is used)
  • (?!$) / (?!\r?$) - a negative lookahead preventing an empty string / line match
  • (?<street>.*?(?=(?:,\s*\d+)?\s*-\s*\w+,))? - Group "street": any 0+ chars as few as possible up to an optional sequence of ,, 0+ whitespaces, 1+ digits and then a - enclosed with 0+ whitespaces, 1+ word chars and a ,
  • (?:,\s*(?<number>[1-9][0-9]*))? - an optional non-capturing group matching ,, 0+ whitespaces and then captures into Group "number" a digit from 1 to 9 and then any 0+ digits
  • \s*(?:-\s*)?(?<county>[\w\s]+) - 0+ whitespaces, an optional sequence of a hyphen and then 0+ whitespaces, then captures into Group "county" any 1+ word and whitespace chars
  • ,\s*(?<State>[A-Z]{2}) - a comma, 0+ whitespaces and then captures into Group "State" 2 uppercase letters
  • \s*-\s* - a hyphen enclosed with 0+ whitespaces
  • (?<Country>.*) - Group "country": any 0+ chars other than LF as many as possible
  • $ - start of string/line.
like image 74
Wiktor Stribiżew Avatar answered Oct 22 '22 21:10

Wiktor Stribiżew