Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a C# 'grep' more Functional using LINQ?

I have a method that performs a simplistic 'grep' across files, using an enumerable of "search strings". (Effectively, I'm doing a very naive "Find All References")

IEnumerable<string> searchStrings = GetSearchStrings();
IEnumerable<string> filesToLookIn = GetFiles();
MultiMap<string, string> references = new MultiMap<string, string>();

foreach( string fileName in filesToLookIn )
{
    foreach( string line in File.ReadAllLines( fileName ) )
    {
        foreach( string searchString in searchStrings )
        {
            if( line.Contains( searchString ) )
            {
                references.AddIfNew( searchString, fileName );
            }
        }
    }
}

Note: MultiMap<TKey,TValue> is roughly the same as Dictionary<TKey,List<TValue>>, just avoiding the NullReferenceExceptions you'd normally encounter.


I have been trying to put this into a more "functional" style, using chained LINQ extension methods but haven't figured it out.

One dead-end attempt:

// I get lost on how to do a loop within a loop here...
// plus, I lose track of the file name
var lines = filesToLookIn.Select( f => File.ReadAllLines( f ) ).Where( // ???

And another (hopefully preserving the file name this time):

var filesWithLines =
    filesToLookIn
        .Select(f => new { FileName = f, Lines = File.ReadAllLines(f) });

var matchingSearchStrings =
    searchStrings
        .Where(ss => filesWithLines.Any(
                         fwl => fwl.Lines.Any(l => l.Contains(ss))));

But I still seem to lose the information I need.

Maybe I'm just approaching this from the wrong angle? From a performance standpoint, the loops ought to perform in roughly the same order as the original example.

Any ideas of how to do this in a more compact functional representation?

like image 365
Jonathan Mitchem Avatar asked Feb 04 '23 10:02

Jonathan Mitchem


1 Answers

How about:

var matches =
    from fileName in filesToLookIn
    from line in File.ReadAllLines(fileName)
    from searchString in searchStrings
    where line.Contains(searchString)
    select new
    {
        FileName = fileName,
        SearchString = searchString
    };

    foreach(var match in matches)
    {
        references.AddIfNew(match.SearchString, match.FileName);
    }

Edit:

Conceptually, the query turns each file name into a set of lines, then cross-joins that set of lines to the set of search strings (meaning each line is paired with each search string). That set is filtered to matching lines, and the relevant information for each line is selected.

The multiple from clauses are similar to nested foreach statements. Each indicates a new iteration in the scope of the previous one. Multiple from clauses translate into the SelectMany method, which selects a sequence from each element and flattens the resulting sequences into one sequence.

All of C#'s query syntax translates to extension methods. However, the compiler does employ some tricks. One is the use of anonymous types. Whenever 2+ range variables are in the same scope, they are probably part of an anonymous type behind the scenes. This allows arbitrary amounts of scoped data to flow through extension methods like Select and Where, which have fixed numbers of arguments. See this post for further details.

Here is the extension method translation of the above query:

var matches = filesToLookIn
    .SelectMany(
        fileName => File.ReadAllLines(fileName),
        (fileName, line) => new { fileName, line })
    .SelectMany(
        anon1 => searchStrings,
        (anon1, searchString) => new { anon1, searchString })
    .Where(anon2 => anon2.anon1.line.Contains(anon2.searchString))
    .Select(anon2 => new
    {
        FileName = anon2.anon1.fileName,
        SearchString = anon2.searchString
    });
like image 118
Bryan Watts Avatar answered Feb 06 '23 12:02

Bryan Watts