I need to remove words from a string based on a set of words:
Words I want to remove:
DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND
If I receive a string like:
EDIT: This string is already "cleaned" from any symbols
THIS IS AN AMAZING WEBSITE AND LAYOUT
The result should be:
THIS IS AMAZING WEBSITE LAYOUT
So far I have:
public static string StringWordsRemove(string stringToClean, string wordsToRemove)
{
string[] splitWords = wordsToRemove.Split(new Char[] { ' ' });
string pattern = "";
foreach (string word in splitWords)
{
pattern = @"\b" + word + "\b";
stringToClean = Regex.Replace(stringToClean, pattern, "");
}
return stringToClean;
}
But it's not removing the words, any idea?
I don't know if I'm using the most eficient way to do it, maybe put the words in a array just to avoid spliting them all the time?
Thanks
In C programming, an array is derived data that stores primitive data type values like int, char, float, etc. To delete a specific element from an array, a user must define the position from which the array's element should be removed. The deletion of the element does not affect the size of an array.
Logic to remove all occurrences of a characterRun a loop from start character of str to end. Inside the loop, check if current character of string str is equal to toRemove. If the mentioned condition is true then shift all character to one position left from current matched position to end of string.
We can use string replace() function to replace a character with a new character. If we provide an empty string as the second argument, then the character will get removed from the string.
private static List<string> wordsToRemove =
"DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND".Split(' ').ToList();
public static string StringWordsRemove(string stringToClean)
{
return string.Join(" ", stringToClean.Split(' ').Except(wordsToRemove));
}
Modification to handle punctuations:
public static string StringWordsRemove(string stringToClean)
{
// Define how to tokenize the input string, i.e. space only or punctuations also
return string.Join(" ", stringToClean
.Split(new[] { ' ', ',', '.', '?', '!' }, StringSplitOptions.RemoveEmptyEntries)
.Except(wordsToRemove));
}
I just changed this line
pattern = @"\b" + word + "\b";
to this
pattern = @"\b" + word + @"\b"; //added '@'
and I got the result
THIS IS AMAZING WEBSITE LAYOUT
and it would be better if you use String.Empty
instead of ""
like:
stringToClean = Regex.Replace(stringToClean, pattern, String.Empty);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With