Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance: .Join vs .Contains - Linq to Entities

I am using Linq to entities to query the database to get the list of int for further processing. I have two ways to get the list as below:

First is:

List<int> lstBizIds = new List<int>() { 1, 2, 3, 4, 5 };
List<int> lstProjectIds = context.Projects.Where(x => lstBizIds.Contains(x.businessId)).Select(x => x.projectId).ToList();

Second is:

List<int> lstBizIds = new List<int>() { 1, 2, 3, 4, 5 };
List<int> lstProjectIds = context.Projects.Join(lstBizIds, p => p.businessId, u => u, (p, u) => p.projectId).ToList();

Now my question is which one of the methods above is better performance wise? Also does it affect the performance if the first list i.e. lstBizIds grows in size? Suggest me other ways of implementation as well if that are performance reducing.

like image 930
Girish Vadhel Avatar asked Nov 17 '16 09:11

Girish Vadhel


People also ask

Is LINQ or SQL faster?

More importantly: when it comes to querying databases, LINQ is in most cases a significantly more productive querying language than SQL. Compared to SQL, LINQ is simpler, tidier, and higher-level.


2 Answers

You should go with Contains, because EF can produce a more efficient query.

This would be the SQL join:

SELECT Id
FROM Projects
INNER JOIN (VALUES (1), (2), (3), (4), (5)) AS Data(Item) ON Projects.UserId = Data.Item

This would be the SQL Contains:

SELECT Id
FROM Projects
WHERE UserId IN (1, 2, 3, 4, 5, 6)

IN is more efficient than JOIN because the DBMS can stop looking after the first match of the IN; the JOIN always finishes, even after the the first match.

You might also want to check which queries are actually sent to the DB. You always have to compare the SQL, not the LINQ code (obviously).

like image 101
Sefe Avatar answered Nov 16 '22 01:11

Sefe


Performing a join is quite efficient because Where condition actually performs a cartesian product of all the tables, then filters the rows that satisfy the condition. This means the Where condition is evaluated for each combination of rows (n1 * n2 * n3 * n4)

The Join operator takes the rows from the first tables, then takes only the rows with a matching key from the second table, then only the rows with a matching key from the third table, and so on. Secondly, contains would work in an iterative manner making it slower than join

like image 29
g.005 Avatar answered Nov 16 '22 01:11

g.005