Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove specific words from a string

Tags:

c#

regex

I am trying to parse a file of street names for a project, and need to remove modifiers (Upper / Lower /Old / New / North / East / South / West ...) and endings (street / road / way / lane...), but I am hving no luck with a regular expression.

The way it is set up at the moment is that the program will parse the file one line (ie. street) at a time, and check it

I think the problem is word boundries - what I need for example are the following transformations...
Old Harrow Way -> Harrow (ie. remove 'Old' prefix and 'Way' ending)
Chittock Mead -> Chittock (Remove the ending 'Mead')
- But to leave these alone when in a word:
Gold Lane -> Gold (just remove ending)
Eastley Avenue -> Eastly (just remove ending)
Upper Western Avenue -> Western (remove prefix and ending)

Obviously, things like "South Street" would remove both - This is ok, because I can discard an empty string.

Can anyone give me an idea of how to do this - I've been reading up on regular expressions and trying things for hours!

like image 255
Richard Avatar asked Feb 22 '11 21:02

Richard


2 Answers

I would use a <list> or Array to store those values and then possibly a foreach loop to check the address against the list or array. You would then use .remove to remove each instance of the list or array item. There is more to this, but that is the general idea.

like image 99
The Muffin Man Avatar answered Oct 14 '22 02:10

The Muffin Man


I'd use string.split(" ") to split the address into and array of words. Then take the first word and see it exists on a list of prefixes (ie a or Array). Do the same for the last word and the endings.

Running through two lists of reg-ex expressions for each input address will be time consuming. Using my logic should be a good deal faster, especially if the lists are sorted and b-searched.

If the address data is a bit dirty (ie, punctuation, double spaces, etc), you may want to do some cleanup, as an input string like " Main St" will have more 'words' than are really there (hint: Trim() and RegEx.Replace(" "," ")).

like image 23
Marc Bernier Avatar answered Oct 14 '22 03:10

Marc Bernier