Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fixing street names with regex

Tags:

regex

php

I've got to solve a regex problem that might be to specific, looking through Stack overflow I've made some good discoveries, but have not been able to piece them together to make it work.

Basically I want this:

lorem ipsum north road => lorem ipsum rd (n)

north lorem ipsum rd => lorem ipsum rd (n)

lorem ipsum road north => lorem ipsum rd (n)

As part of an auto complete program i need to convert partial text to the correct version so it can check the database

lorem ipsum south rd => lorem ipsum rd (s)

west lorem ipsum road => lorem ipsum rd (w)

I don't want somebody to code this program for me, but I would like to know the best way of tackling the problem.

Now you might ask me why I bother, as people would not write with such f'd up grammar, but that's because I'm not only dealing with English :(

Cheers

like image 986
Moak Avatar asked Dec 14 '10 04:12

Moak


1 Answers

Seems to me that the most difficult bit is to look for the proper words in the proper location in the line with regexps, so, although not elegant, could this be a more managable way to do it with minimum regexp? :

  1. Extract all the known words and their variations (road types, direction, numbers, ...) from the address line and, hopefuly, we would be left with the road name.

  2. Compose the address line back, but in the order we need (road name + road type + direction).

Once you get rid of the position, the posible theoretical variations are still large but the predictable variations shouldn't be that large ?, even accounting for spelling mistakes: Avenue: Avenu, Avene, Aveniu, Avn, Av, Avn. Road: Rd, Roud, Roade, Roud?

like image 93
AJJ Avatar answered Oct 01 '22 04:10

AJJ