I would usually do something like this using a string libray. But I wonder if it can be done using a regex.
I want to do the following: Given a search string:
Seattle is awesome
I want to find all substrings of it in a given sentence. So applying a regex to the following sentence
Seattle Seattle is awesome is awesome awesome is Seattle
Should give me
Seattle, Seattle is awesome, is awesome, awesome, is, Seattle
One constraint that might be helpful is that the sentence will always have only the words present in the search string and whitespaces in between.
Note If there's a match, it should be the longest possible string. So like in the above example the matches shouldn't be single words rather longest possible substrings. Order among words also needs to be maintained. That's why
awesome is Seattle
in the sentence above gives us
awesome, is and Seattle
I am not sure if something like this can be done with a regex though, since it is greedy. Would appreciate any insight into this! I'm familiar with both C# and Java and could use either of their regex libraries.
I don't think you can do this with a regex. Wikipedia has a good article on the longest common subsequence problem.
There is no good way to express such a pattern directly with a regex.
You'll need to list all allowed combinations:
Seattle is awesome|Seattle is|Seattle|is awesome|is|awesome
or more succinctly:
Seattle( is( awesome)?)?|is( awesome)?|awesome
You can programmatically transform your input string to this format.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With