Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching in text files until specific string

I'm writing a program to search text files, where there's a specific string in each one. The Goal is to ignore everything after that string. My current code reads the whole text file and returns an Enumerable of the resulting file names where a term was found.

var searchResults = files.Where(file => File.ReadAllText(file.FullName).Contains(searchTerm)).Select(file => file.FullName);

Would it be possible to incorporate ignoring all lines after that specific string? The Performance would be important as there are thousands of files.

like image 603
drouning Avatar asked Feb 10 '23 10:02

drouning


2 Answers

You can change your query to:

var searchResults = files.Where(file => File.ReadLines(file.FullName).Any(line => line.Contains(searchTerm))
                         .Select(file => file.FullName));

Instead of using File.ReadAllText you can use File.ReadLines which is lazily-evaluated and should stop reading when the condition is met.

https://msdn.microsoft.com/en-us/library/vstudio/dd383503(v=vs.100).aspx

To make it faster you can also use Parallel LINQ:

var searchResults = files.AsParallel()
                         .Where(file => File.ReadLines(file.FullName).Any(line => line.Contains(searchTerm))
                         .Select(file => file.FullName));
like image 142
w.b Avatar answered Feb 12 '23 00:02

w.b


You can read file line by line and close it if value is found:

    static string[] SearchFiles(string[] filesSrc, string searchTerm)
    {
        List<string> result = new List<string>();
        string line = "";
        StreamReader reader = null;

            for (int i = 0; i < filesSrc.Length; i++)
            {
                reader = new StreamReader(filesSrc[i]);
                while ((line = reader.ReadLine()) != null)
                    if (line.Contains(searchTerm)) { result.Add(filesSrc[i]); break; }
            }

        reader.Dispose();

        return result.ToArray();
    }

And use it like : string[] files = SearchFiles(yourfiles[], "searchTerm");

Depending on what you need, you can pass File[] to this method and then get string value with fullpath but you didn't provide an example of your File class and it is difficult to implement it without knowing of what your class actually looks like.

P.S. Using LINQ is another possible solution and a good one (not to mention that it's just 1-2 line of code).

Improvised performance test showed that LINQ is only 10-20% slower in this case so it's probably better to stick with it.

like image 24
Fabjan Avatar answered Feb 12 '23 01:02

Fabjan