Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Replace Multiple Words in a String Using C#?

Tags:

string

c#

I'm wondering how I can replace (remove) multiple words (like 500+) from a string. I know I can use the replace function to do this for a single word, but what if I want to replace 500+ words? I'm interested in removing all generic keywords from an article (such as "and", "I", "you" etc).

Here is the code for 1 replacement.. I'm looking to do 500+..

        string a = "why and you it";
        string b = a.Replace("why", "");
        MessageBox.Show(b);

Thanks

@ Sergey Kucher Text size will vary between a few hundred words to a few thousand. I am replacing these words from random articles.

like image 276
user1926567 Avatar asked Aug 04 '13 06:08

user1926567


Video Answer


1 Answers

I would normally do something like:

// If you want the search/replace to be case sensitive, remove the 
// StringComparer.OrdinalIgnoreCase
Dictionary<string, string> replaces = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) { 
    // The format is word to be searched, word that should replace it
    // or String.Empty to simply remove the offending word
    { "why", "xxx" }, 
    { "you", "yyy" },
};

void Main()
{
    string a = "why and you it and You it";

    // This will search for blocks of letters and numbers (abc/abcd/ab1234)
    // and pass it to the replacer
    string b = Regex.Replace(a, @"\w+", Replacer);
}

string Replacer(Match m)
{
    string found = m.ToString();

    string replace;

    // If the word found is in the dictionary then it's placed in the 
    // replace variable by the TryGetValue
    if (!replaces.TryGetValue(found, out replace))
    {
        // otherwise replace the word with the same word (so do nothing)
        replace = found;
    }
    else
    {
        // The word is in the dictionary. replace now contains the
        // word that will substitute it.

        // At this point you could add some code to maintain upper/lower 
        // case between the words (so that if you -> xxx then You becomes Xxx
        // and YOU becomes XXX)
    }

    return replace;
}

As someone else wrote, but without problems with substrings (the ass principle... You don't want to remove asses from classes :-) ), and working only if you only need to remove words:

var escapedStrings = yourReplaces.Select(Regex.Escape);
string result = Regex.Replace(yourInput, @"\b(" + string.Join("|", escapedStrings) + @")\b", string.Empty);

I use the \b word boundary... It's a little complex to explain what it's, but it's useful to find word boundaries :-)

like image 119
xanatos Avatar answered Oct 31 '22 19:10

xanatos