I'm writing a program to help me search for a keyword inside thousands of files. Each of these files has unnecessary lines that i need to ignore because they mess with the results. Luckily they're all located after a specific line inside those files.
What i've already got is a search, without ignoring the lines after that specific line, returning an Enumerable of the file names containing the keyword.
var searchResults = files.Where(file => File.ReadLines(file.FullName)
.Any(line => line.Contains(keyWord)))
.Select(file => file.FullName);
Is there a simple and fast way to implement this functionality? It doesn't necessarily have to be in Linq as i'm not even sure if this would be possible.
Edit:
An example to make it clearer.
This is how the text files are structured:
xxx
xxx
string
yyy
yyy
I want to search the xxx lines until either the keyword is found or the string and then skip to the next file. The yyy lines i want to ignore in my search.
Try this:
var searchResults = files.Where(file => File.ReadLines(file.FullName)
.TakeWhile(line => line != "STOP")
.Any(line => line.Contains(keyWord)))
.Select(file => file.FullName);
You can process files in parallel, just add AsParallel() after "files". This should improve files processing speed. ReadLines does not read the whole file before search, so it should work as you expect.
EDIT: sorry misread your question first time and haven't noticed stop word. Given that I think it would be more easy to avoid LINQ:
IEnumerable<FileInfo> parallelFiles = files.AsParallel();
var result = new ConcurrentBag<string>();
foreach (var file in parallelFiles)
{
foreach (string line in File.ReadLines(file.FullName))
{
if (line.Contains(keyWord))
{
result.Add(file.FullName);
break;
}
else if (line.Contains(stopWord))
{
break;
}
}
}
It's only a minor modification: ignore the lines that don't contain the search string and only read the first occurrence:
var searchResults = files.Where(file => File.ReadLines(file.FullName)
.TakeWhile(line => != myString)
.Any(line => line.IndexOf(keyWord) > -1)
)
.Select(file => file.FullName);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With