Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove words in string from words in array with c#

Tags:

arrays

string

c#

I need to remove words from a string based on a set of words:

Words I want to remove:

DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND

If I receive a string like:

EDIT: This string is already "cleaned" from any symbols

THIS IS AN AMAZING WEBSITE AND LAYOUT

The result should be:

THIS IS AMAZING WEBSITE LAYOUT

So far I have:

public static string StringWordsRemove(string stringToClean, string wordsToRemove)
{
    string[] splitWords = wordsToRemove.Split(new Char[] { ' ' });

    string pattern = "";

    foreach (string word in splitWords)
    {
        pattern = @"\b" + word + "\b";
        stringToClean = Regex.Replace(stringToClean, pattern, "");
    }

    return stringToClean;
}

But it's not removing the words, any idea?

I don't know if I'm using the most eficient way to do it, maybe put the words in a array just to avoid spliting them all the time?

Thanks

like image 883
Patrick Avatar asked Jul 16 '13 14:07

Patrick


People also ask

Can you remove elements from an array in C?

In C programming, an array is derived data that stores primitive data type values like int, char, float, etc. To delete a specific element from an array, a user must define the position from which the array's element should be removed. The deletion of the element does not affect the size of an array.

How do you remove all occurrences of a character from a string in C?

Logic to remove all occurrences of a characterRun a loop from start character of str to end. Inside the loop, check if current character of string str is equal to toRemove. If the mentioned condition is true then shift all character to one position left from current matched position to end of string.

How do I remove a character from a string?

We can use string replace() function to replace a character with a new character. If we provide an empty string as the second argument, then the character will get removed from the string.


2 Answers

private static List<string> wordsToRemove =
    "DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND".Split(' ').ToList();

public static string StringWordsRemove(string stringToClean)
{
    return string.Join(" ", stringToClean.Split(' ').Except(wordsToRemove));
}

Modification to handle punctuations:

public static string StringWordsRemove(string stringToClean)
{
    // Define how to tokenize the input string, i.e. space only or punctuations also
    return string.Join(" ", stringToClean
        .Split(new[] { ' ', ',', '.', '?', '!' }, StringSplitOptions.RemoveEmptyEntries)
        .Except(wordsToRemove));
}
like image 164
Fung Avatar answered Sep 28 '22 06:09

Fung


I just changed this line

pattern = @"\b" + word + "\b";

to this

pattern = @"\b" + word + @"\b"; //added '@' 

and I got the result

THIS IS AMAZING WEBSITE LAYOUT

and it would be better if you use String.Empty instead of "" like:

stringToClean = Regex.Replace(stringToClean, pattern, String.Empty);
like image 43
Shaharyar Avatar answered Sep 28 '22 06:09

Shaharyar