Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL street address fuzzy search

Does anyone know of a good script (or good strategy) for doing a fuzzy street address search on a MySQL database? The key issues are:

  • capitalization (easy -- just use LCASE)
  • punctuation (could use REPLACE; not sure it there are more efficient options)
  • abbreviations (this is the tough one -- street = st, etc.)

I want to be able to match: 123 Main st, unit B = 123 Main Street Unit b

like image 901
lcdservices Avatar asked May 14 '11 17:05

lcdservices


1 Answers

While it's not perfect and can be quite slow, you'll want to use a regular expression via REGEXP().

Here's a first-pass regular expression to match most cases (as well as your example):

(?isx)                  # search across multiple lines and ignore case
(                       # full match
  (                       # st number - what about number words like one or two?
    \d+
  )
  \s+                     # whitespace
  (                       # street name (one or more words)
    [a-z]+
    (?:
      \s+
      [a-z]+
    )*
  )
  \s+                     # whitespace
  (                       # street type
    al(?:y\.?|ley)          # aly, aly. or alley
  |
    ave(?:\.|nue)?          # ave, ave., or avenue
  |
    b(?lvd\.?|oulevard)     # blvd, blvd. or boulevard
  |
    c(?:t\.?|ourt)          # ct, ct. or court
  |
    cir(?:\c\.?|cle)?       # cir, circ, circ. or circle
  |
    cres(?:\.|cent)?        # cres, cres. or crescent
  |
    dr(?:\.|ive)?           # dr, dr. or drive
  |
    exp(?:y\.?|ressway)     # expy, expy. or expressway
  |
    f(?:wy\.?|reeway)       # fwy, fwy. or freeway
  |
    g(?:rdns\.?|ardens)     # grdns, grdns. or gardens
  |
    h(?:wy\.?|ighway)       # hwy, hwy. or highway
  |
    l(?n\.?|ane)            # ln, ln. or land
  |
    m(?:nr\.?|anor)         # mnr, mnr. or manor
  |
    m(?:trwy\.?|otorway)    # mtrwy, wtrwy. or motorway
  |
    pl(?:\.|ace)?           # pl, pl. or place
  |
    r(?:d\.?|oad)           # rd, rd. or road
  |
    st(?:\.|reet)?          # st, st. or street
  |
    t(?:pk\.?|urnpike)      # tpk, tpk. or turnpike
  |
    ter(?:\r?\.?|race)      # ter, ter., terr, terr. or terrace
  |
    tr(?:l.\?|ail)          # trl, trl. or trail
  |
    pike|park|walk|loop|bay|close|gate|highlands
  |
    row|way|oval|dell|rise|vale|byway|lawn
  )
  \,?                     # optional comma
  \s+                     # whitespace
  (                       # optional number, unit, apt or floor
    (
      \#                    # number
    |
      unit                  # unit
    |
      num(?:\.|ber)         # num, num. or number
    |
      ap(?:t\.?|artment)    # apt, apt. or apartment
    |
      fl(?:\.|oor)?         # fl, fl. or floor
    )
    \s+
    \d+
  )?
)

Which will return:

$1 - full match

$2 - house number

$3 - street name

$4 - street type

$5 - unit or apt number (optional)

To use this in mysql, you'll need to strip out all the comments (from '#' to eol), remove the first line (switch options), and collapse everything to a single line without any whitespace.

like image 68
Rob Raisch Avatar answered Oct 03 '22 19:10

Rob Raisch