Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running the same linq query on multiple IQueryable in parallel?

Situation: I have a List<IQueryable<MyDataStructure>>. I want to run a single linq query on each of them, in parallel, and then join the results.

Question: How to create a linq query which I can pass as a parameter?

Example code:

Here's some simplified code. First, I have the collection of IQueryable<string>:

    public List<IQueryable<string>> GetQueries()
    {
        var set1 = (new List<string> { "hello", "hey" }).AsQueryable();
        var set2 = (new List<string> { "cat", "dog", "house" }).AsQueryable();
        var set3 = (new List<string> { "cat", "dog", "house" }).AsQueryable();
        var set4 = (new List<string> { "hello", "hey" }).AsQueryable();

        var sets = new List<IQueryable<string>> { set1, set2, set3, set4 };

        return sets;
    }

I would like to find all the words which start with letter 'h'. With a single IQueryable<string> this is easy:

query.Where(x => x.StartsWith("h")).ToList()

But I want to run the same query against all the IQueryable<string> objects in parallel and then combine the results. Here's one way to do it:

        var result = new ConcurrentBag<string>();
        Parallel.ForEach(queries, query =>
        {
            var partOfResult = query.Where(x => x.StartsWith("h")).ToList();

            foreach (var word in partOfResult)
            {
                result.Add(word);
            }
        });

        Console.WriteLine(result.Count);

But I want this to be a more generic solution. So that I could define the linq operation separately and pass it as a parameter to a method. Something like this:

        var query = Where(x => x.FirstName.StartsWith("d") && x.IsRemoved == false)
            .Select(x => x.FirstName)
            .OrderBy(x => x.FirstName);

        var queries = GetQueries();

        var result = Run(queries, query);

But I'm at loss on how to do this. Any ideas?

like image 820
Mikael Koskinen Avatar asked Dec 26 '22 09:12

Mikael Koskinen


2 Answers

So the first thing that you wanted was a way of taking a sequence of queries, executing all of them, and then getting the flattened list of results. That's simple enough:

public static IEnumerable<T> Foo<T>(IEnumerable<IQueryable<T>> queries)
{
    return queries.AsParallel()
            .Select(query => query.ToList())
            .SelectMany(results => results);
}

For each query we execute it (call ToList on it) and it's done in parallel, thanks to AsParallel, and then the results are flattened into a single sequence through SelectMany.

The other thing that you wanted to do was to add a number of query operations to each query in a sequence of queries. This doesn't need to be parallelized (thanks to deferred execution, the calls to Where, OrderBy, etc. take almost no time) and can just be done through Select:

var queries = GetQueries().Select(query =>
    query.Where(x => x.FirstName.StartsWith("d")
        && !x.IsRemoved)
    .Select(x => x.FirstName)
    .OrderBy(x => x.FirstName));

var results = Foo(queries);

Personally I don't really see a need to combine these two methods. You can make a method that does both, but they're really rather separate concepts so I don't see a need for it. If you do want them combined though, here it is:

public static IEnumerable<TResult> Bar<TSource, TResult>(
    IEnumerable<IQueryable<TSource>> queries,
    Func<IQueryable<TSource>, IQueryable<TResult>> selector)
{

    return queries.Select(selector)
        .AsParallel()
        .Select(query => query.ToList())
        .SelectMany(results => results);
}

Feel free to make either Foo or Bar extension methods if you want. Also, you really better rename them to something better if you're going to use them.

like image 52
Servy Avatar answered Dec 29 '22 00:12

Servy


First - given your current implementation, there is no reason to use IQueryable<T> - you could just use IEnumerable<T>.

You could then write a method which takes an IEnumerable<IEnumerable<T>> and a Func<IEnumerable<T>, IEnumerable<U>>, to build a result:

IEnumerable<IEnumerable<U>> QueryMultiple<T,U>(IEnumerable<IEnumerable<T>> inputs, Func<IEnumerable<T>,IEnumerable<U>> mapping)
{
     return inputs.AsParallel().Select(i => mapping(i));
}

You could then use this as:

void Run()
{
    IEnumerable<IEnumerable<YourType>> inputs = GetYourObjects();

    Func<IEnumerable<YourType>, IEnumerable<YourType>> query = i => 
       i.Where(x => x.FirstName.StartsWith("d") && x.IsRemoved == false)
        .Select(x => x.FirstName)
        .OrderBy(x => x.FirstName);

    var results = QueryMultiple(inputs, query);
}
like image 34
Reed Copsey Avatar answered Dec 29 '22 00:12

Reed Copsey