Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this Cross Join so Slow in Linq?

I wrote this piece of Linq to handle doing a CROSS Join just like a database would between multiple lists.

But for some reason it's extremely slow when any of the lists go more than 3000. I'd wait for 30s ? These lists could go to very large numbers.

This query is looped for each relationship with the other list's data coming from ColumnDataIndex.

Any Advice ?

UPDATE ** - The data is inserted into normal lists that are built before hand from the configured sources. This is all in memory at the moment.

RunningResult[parameter.Uid] = (from source_row in RunningResult[parameter.Uid]
                            from target_row in ColumnDataIndex[dest_key]
                            where GetColumnFromUID(source_row, rel.SourceColumn) == GetColumnFromUID(target_row, rel.TargetColumn)
                            select new Row()
                            {
                                Columns = MergeColumns(source_row.Columns, target_row.Columns)

                            }).ToList();

The 2 extra functions:

MergeColumns: Takes the Columns from the 2 items and merges them into a single array.

public static Columnn[] MergeColumns(Column[] source_columns, Column[] target_columns)
{
      Provider.Data.BucketColumn[] new_column = new Provider.Data.BucketColumn[source_columns.Length + target_columns.Length];
      source_columns.CopyTo(new_column, 0);
      target_columns.CopyTo(new_column, source_columns.Length);
      return new_column;
  }

GetColumnFromUID: Returns the Value of the Column in the Item matching the column uid given.

private static String GetColumnFromUID(Row row, String column_uid)
  {
       if (row != null)
       {
           var dest_col = row.Columns.FirstOrDefault(col => col.ColumnUid == column_uid);
           return dest_col == null ? "" + row.RowId : dest_col.Value.ToString().ToLower();
       }
       else return String.Empty;

  }

Update:

Ended up moving the data and the query to a database. This reduced to the speed to a number of ms. Could have written a optimized looped function but this was the fastest way out for me.

like image 987
Johann du Toit Avatar asked Mar 12 '13 13:03

Johann du Toit


People also ask

What is LINQ Cross join?

What is Linq Cross Join? When combining two data sources (or you can two collections) using Linq Cross Join, then each element in the first data source (i.e. first collection) will be mapped with each and every element in the second data source (i.e. second collection).

Does LINQ make this faster?

Both of the LINQ examples don't only perform better, but they also make the intentions more clear in my opinion. As pointed on out in a reddit thread, it isn't LINQ that's making this faster. The cause of these improvements is found in the lookup of the Join method, not LINQ itself.

Do the LINQ extension methods really slow things down?

My little investigation shows that using the LINQ extension methods ElementAt (), FirstOrDefault () and SingleOrDefault () really can slow things down if used inside loops and the number of iterations > 1000. For single calls and a low number of iterations the LINQ methods are fast.

How to implement the cross join using method syntax?

In order to implement the Cross Join using method syntax, we need to use either the SelectMany () method or the Join () method as shown in the below example. It will give you the same result as the previous example.


1 Answers

You don't actually need to be performing a cross join. Cross joins are inherently expensive operations. You shouldn't be doing that unless you really need it. In your case what you really need is just an inner join. You're performing a cross join which is resulting in lots of values that you don't need at all, and then you're filtering out a huge percentage of those values to leave you with the few that you need. If you just did an inner join from the start you would only compute the values that you need. That will save you from needing to create a whole lot of rows you don't need just to have them be thrown away.

LINQ has its own inner join operation, Join, so you don't even need to write your own:

RunningResult[parameter.Uid] = (from source_row in RunningResult[parameter.Uid]
                                join target_row in ColumnDataIndex[dest_key]
                                on GetColumnFromUID(source_row, rel.SourceColumn) equals
                                    GetColumnFromUID(target_row, rel.TargetColumn)
                                select new Row()
                                {
                                    Columns = MergeColumns(source_row.Columns, target_row.Columns)

                                }).ToList();
like image 115
Servy Avatar answered Sep 21 '22 11:09

Servy