Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to do comparisons involving many-many relationships with LINQ in EF 4.1?

In my database I have the following tables:

  • Person
  • Post
  • InterestTag

Many-Many relationships exist between Person-InterestTag and Post-InterestTag

I need to perform a linq query in EF 4.1 to pull back any post that contains at least one interest tag that matches at least one interest tag related to the given user.

Example

A person has the following interests:

  • Cars
  • Sports
  • Fitness

I need to return any Post that is related to either cars, sports, or fitness.

What is the most efficient way to write this query in terms of performance?

Edit

Running into an error based on the answer given below...

This compiles fine but throws an error at runtime:

var matchingPosts = posts.Where(post => post.Topics.Any(postTopic =>   person.Interests.Contains(postTopic)));

The error is:

Unable to create a constant value of type 'System.Collections.Generic.ICollection`1'. Only primitive types ('such as Int32, String, and Guid') are supported in this context.

Any ideas how to fix this?

EDIT 2

So my classes are structured as such:

public class Person
{
    public int PersonID {get; set;}
    public string FirstName {get; set;}
    public string LastName {get; set;}
    //other properties of types string, int, DateTime, etc.

    public ICollection<InterestTag> InterestTags {get; set;}      
}


public class Post
{
    public int PostID {get; set;}
    public string Title{get; set;}
    public string Content {get; set;}
    //other properties of types string, int, DateTime, etc.

    public ICollection<InterestTag> InterestTags {get; set;}


}

public class InterestTag
{
    public int InterestTagID { get; set; }
    public string InterestDescription { get; set; }
    public bool Active { get; set; }

    public ICollection<Person> Persons { get; set; }
    public ICollection<Post> Posts { get; set; }
}

In my Context class I am overriding OnModelCreating to define my DB table names

modelBuilder.Entity<Person>().HasMany(u => u.InterestTags).WithMany(t => t.Persons)
    .Map(m =>
    {
        m.MapLeftKey("PersonID");
        m.MapRightKey("InterestTagID");
        m.ToTable("PersonInterestTags");
    });

modelBuilder.Entity<Post>().HasMany(u => u.InterestTags).WithMany(t => t.Posts)
    .Map(m =>
    {
        m.MapLeftKey("PostID");
        m.MapRightKey("InterestTagID");
        m.ToTable("PostInterestTags");
    });

In my query method I am bring back an IQueryable of Post and applying some filters, including the clause in which I am trying to accomplish in this question.

 var person = personRepository.Get(x => x.PersonID = 5);
 var posts = postRepository.GetQueryable();

 //I have tried this and get the error above
 posts= posts.Where(x => x.InterestTags.Any(tag => person.InterestTags.Contains(tag)));
like image 806
stephen776 Avatar asked Oct 19 '11 12:10

stephen776


3 Answers

If you start with just a given personId (or userId) you can do this query in one roundtrip like so:

var posts = context.Posts
    .Intersect(context.People
        .Where(p => p.Id == givenPersonId)
        .SelectMany(p => p.InterestTags.SelectMany(t => t.Posts)))
    .ToList();

This translates into an INTERSECT statement in SQL.

You can also do this in two roundtrips:

var interestTagsOfPerson = context.People.Where(p => p.Id == givenPersonId)
    .Select(p => p.InterestTags.Select(t => t.Id))
    .SingleOrDefault();
// Result is an IEnumerable<int> which contains the Id of the tags of this person

var posts = context.Posts
    .Where(p => p.InterestTags.Any(t => interestTagsOfPerson.Contains(t.Id)))
    .ToList();
// Contains translates into an IN clause in SQL

Using a list of primitive types in the second query (interestTagsOfPerson is a collection of int) also fixes the error you mentioned in your Edit in the question. For Contains you cannot use object references in LINQ to Entities because EF doesn't know how to translate that into SQL.

I have no clue which of the two approaches is faster (SQL experts might have a better idea) but would probably start to test the first option. (I've tested a little bit and it seemed to return the correct results, but it's the first time that I have used Intersect.)

Edit

To give an idea of the generated SQL (captured from SQL Profiler):

The first query (with Intersect) creates this SQL query:

SELECT 
[Intersect1].[Id] AS [C1], 
[Intersect1].[Name] AS [C2], 
FROM  (SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Name] AS [Name], 
    FROM [dbo].[Posts] AS [Extent1]
INTERSECT
    SELECT 
    [Join1].[Id] AS [Id], 
    [Join1].[Name] AS [Name], 
    FROM  [dbo].[PersonInterestTags] AS [Extent2]
        INNER JOIN (SELECT [Extent3].[TagId] AS [TagId],
                           [Extent4].[Id] AS [Id],
                           [Extent4].[Name] AS [Name]
                    FROM  [dbo].[PostInterestTags] AS [Extent3]
                    INNER JOIN [dbo].[Posts] AS [Extent4]
                        ON [Extent3].[PostId] = [Extent4].[Id] ) AS [Join1]
            ON [Extent2].[TagId] = [Join1].[TagId]
    WHERE 1 = [Extent2].[PersonId]) AS [Intersect1]

The second option:

Query1 (for the list of the person's tag Ids):

SELECT 
[Project1].[Id] AS [Id], 
[Project1].[C1] AS [C1], 
[Project1].[TagId] AS [TagId]
FROM ( SELECT 
    [Limit1].[Id] AS [Id], 
    [Extent2].[TagId] AS [TagId], 
    CASE WHEN ([Extent2].[PersonId] IS NULL)
                 THEN CAST(NULL AS int)
                 ELSE 1
             END AS [C1]
    FROM   (SELECT TOP (2) [Extent1].[Id] AS [Id]
        FROM [dbo].[People] AS [Extent1]
        WHERE 1 = [Extent1].[Id] ) AS [Limit1]
    LEFT OUTER JOIN [dbo].[PersonInterestTags] AS [Extent2]
            ON [Limit1].[Id] = [Extent2].[PersonId]
)  AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC

Query 2 for the final posts:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
FROM [dbo].[Posts] AS [Extent1]
WHERE  EXISTS (SELECT 
            1 AS [C1]
        FROM [dbo].[PostInterestTags] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[PostId]) AND ([Extent2].[TagId] IN (1,2,3))
)

In this example the query 1 returned (1,2,3), hence the (1,2,3) in the IN clause in query 2.

like image 140
Slauma Avatar answered Oct 21 '22 21:10

Slauma


Edit: I had to edit my post because I forgot about the many-many-relationship between post and topic. Now it should work.

I cannot tell you if this is the most efficient way, but it would be a way using LINQ queries so it should be highly efficient:

var matchingPosts = posts.Where(post => post.Topics.Any(postTopic => person.Interests.Contains(postTopic)));

If you want to use parallel execution you could modify it like that:

var matchingPosts = posts.AsParallel().Where(post => post.Topics.Any(postTopic => person.Interests.Contains(postTopic)));

As you are using EF you would need this query:

var matchingPosts = from post in posts
    where post.Topics.Any(topic => person.Interests.Contains(topic))
    select post;
like image 3
MatthiasG Avatar answered Oct 21 '22 23:10

MatthiasG


how about this:

context.Persons
       .Where(p => p.Name == "x")
       .SelectMany(p => p.Interests.SelectMany(i => i.Posts))
       .Distinct()
       .Take(10)
       .ToList();

Take() is there for performance reasons and pagination. You should never select all records because first of all no user will read a list of thousands records and second the result set could grow in the future and the query will not scale.

like image 3
Cosmin Onea Avatar answered Oct 21 '22 22:10

Cosmin Onea