Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select previous and next word in a string

Tags:

string

c#

regex

I'm looping through a lot of strings like this one in C# :

“Look, good against remotes is one thing, good against the living, that’s something else.”

In these strings, I have a single selected word, determined by an index from a previous function, like the second "good" in the case above.

“Look, good (<- not this one) against remotes is one thing, good (<- this one) against the living, that’s something else.”

I want to find the words surrounding my selected word. In the case above, thing and against.

“Look, good against remotes is one thing, good against the living, that’s something else.”

I have tried taking the string apart with .split() and different approaches with regular expressions, but I can't find a good way to achieve this. I have access to the word, good in the example above, and the index (41 above) where it's located in the string.

A huge bonus if it would ignore punctuation and commas, so that in the example above, my theoretical function would only return against since there is a comma between thing and good.

Is there a simple way to achieve this? Any help appreciated.

like image 269
Magnus Engdal Avatar asked Nov 22 '13 19:11

Magnus Engdal


2 Answers

Including the "huge bonus":

string text = "Look, good against remotes is one thing, good against the living, that’s something else.";
string word = "good";
int index = 41;

string before = Regex.Match(text.Substring(0, index), @"(\w*)\s*$").Groups[1].Value;
string after = Regex.Match(text.Substring(index + word.Length), @"^\s*(\w*)").Groups[1].Value;

In this case before will be an empty string because of the comma, and after will be "against".

Explanation: When getting before, the first step is to grab just the first part of the string up until just before the target word, text.Substring(0, index) does this. Then we use the regular expression (\w*)\s*$ to match and capture a word (\w*) followed by any amount of whitespace \s* at the end of the string ($). The contents of the first capture group is the word we want, if we could not match a word the regex will still match but it will match an empty string or only whitespace, and the first capture group will contain an empty string.

The logic for getting after is pretty much the same, except that text.Substring(index + word.Length) is used to get the rest of the string after the target word. The regex ^\s*(\w*) is similar except that it is anchored to the beginning of the string with ^ and the \s* comes before the \w* since we need to strip off whitespace on the front end of the word.

like image 184
Andrew Clark Avatar answered Oct 12 '22 04:10

Andrew Clark


string phrase = "Look, good against remotes is one thing, good against the living, that’s something else.";
int selectedPosition = 41;
char[] ignoredSpecialChars = new char[2] { ',', '.' };

string afterWord = phrase.Substring(selectedPosition)
                         .Split(' ')[1]
                         .Trim(ignoredSpecialChars);
string beforeWord = phrase.Substring(0, selectedPosition)
                          .Split(' ')
                          .Last()
                          .Trim(ignoredSpecialChars);

You can change ignoredSpecialChars array to get rid of the special characters you don't need.

UPDATE:

This will return null if there are any special characters between your word and words that surround it.

string phrase = "Look, good against remotes is one thing, good against the living, that’s something else.";
int selectedPosition = 41;
char[] ignoredSpecialChars = new char[2] { ',', '.' };

string afterWord = phrase.Substring(selectedPosition)
                         .Split(' ')[1];
afterWord = Char.IsLetterOrDigit(afterWord.First()) ?
            afterWord.TrimEnd(ignoredSpecialChars) : 
            null;

string beforeWord = phrase.Substring(0, selectedPosition)
                          .Split(' ')
                          .Last();
beforeWord = Char.IsLetterOrDigit(beforeWord.Last()) ?
             beforeWord.TrimStart(ignoredSpecialChars) : 
             null;
like image 43
Nikolai Samteladze Avatar answered Oct 12 '22 05:10

Nikolai Samteladze