Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linq for getting words in sentences

Tags:

string

c#

linq

I have a List of words and a List of sentences. I want to know which words can be found in which sentences.

Here is my code:

List<string> sentences = new List<string>();
List<string> words = new List<string>();

sentences.Add("Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur.");
sentences.Add("Alea iacta est.");
sentences.Add("Libenter homines id, quod volunt, credunt.");

words.Add("est");
words.Add("homines");

List<string> myResults = sentences
  .Where(sentence => words
     .Any(word => sentence.Contains(word)))
  .ToList();

What i need is a list of Tuples. With the sentence and the word, that was found in the sentence.

like image 714
sonyfuchs Avatar asked Dec 07 '22 12:12

sonyfuchs


2 Answers

First, we have to define what is word. Let it be any combination of letters and apostrophes.

  Regex regex = new Regex(@"[\p{L}']+");

Second, we should think over on what shall we do with case. Let's implement case insensitive routine:

  HashSet<string> wordsToFind = new HashSet<string>(StringComparer.OrdinalIgnoreCase) {
    "est",
    "homines"
  };

Then we can use Regex to match words in the sentences, and Linq to query the sentences:

Code:

  var actualWords = sentences
    .Select((text, index) => new {
      text = text,
      index = index,
      words = regex
        .Matches(text)
        .Cast<Match>()
        .Select(match => match.Value)
        .ToArray()
    })
    .SelectMany(item => item.words
       .Where(word => wordsToFind.Contains(word))
       .Select(word => Tuple.Create(word, item.index + 1)));

  string report = string.Join(Environment.NewLine, actualWords);

  Console.Write(report);

Outcome:

  (est, 1)         // est appears in the 1st sentence
  (est, 2)         // est appears in the 2nd sentence as well
  (homines, 3)     // homines appears in the 3d sentence

If you want Tuple<string, string> for word, sentence, just change Tuple.Create(word, item.index + 1) for Tuple.Create(word, item.text) in the last Select

like image 60
Dmitry Bychenko Avatar answered Dec 10 '22 01:12

Dmitry Bychenko


Do you just mean this:

IEnumerable<(string, string)> query =
    from sentence in sentences
    from word in words
    where sentence.Contains(word)
    select (sentence, word);

That gives:

query

like image 37
Enigmativity Avatar answered Dec 10 '22 00:12

Enigmativity