I've got to solve a regex problem that might be to specific, looking through Stack overflow I've made some good discoveries, but have not been able to piece them together to make it work.
Basically I want this:
lorem ipsum north road
=> lorem ipsum rd (n)
north lorem ipsum rd
=> lorem ipsum rd (n)
lorem ipsum road north
=> lorem ipsum rd (n)
As part of an auto complete program i need to convert partial text to the correct version so it can check the database
lorem ipsum so
uth rd => lorem ipsum rd (s)
west lorem ipsum roa
d => lorem ipsum rd (w)
I don't want somebody to code this program for me, but I would like to know the best way of tackling the problem.
Now you might ask me why I bother, as people would not write with such f'd up grammar, but that's because I'm not only dealing with English :(
Cheers
Seems to me that the most difficult bit is to look for the proper words in the proper location in the line with regexps, so, although not elegant, could this be a more managable way to do it with minimum regexp? :
Extract all the known words and their variations (road types, direction, numbers, ...) from the address line and, hopefuly, we would be left with the road name.
Compose the address line back, but in the order we need (road name + road type + direction).
Once you get rid of the position, the posible theoretical variations are still large but the predictable variations shouldn't be that large ?, even accounting for spelling mistakes: Avenue: Avenu, Avene, Aveniu, Avn, Av, Avn. Road: Rd, Roud, Roade, Roud?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With