Matching similar entities based on many to many relationship

Question

I have two entities in my database that are connected with a many to many relationship. I was wondering what would be the best way to list which entities have the most similarities based on it?

I tried doing a count(*) with intersect, but the query takes too long to run on every entry in my database (there are about 20k records). When running the query I wrote, CPU usage jumps to 100% and the database has locking issues.

Here is some code showing what I've tried:

My tables look something along these lines:

/* 20k records */
create table Movie(
   Id INT PRIMARY KEY,
   Title varchar(255)
);

/* 200-300 records */
create table Tags(
   Id INT PRIMARY KEY,
   Desc varchar(255)
);

/* 200,000-300,000 records */
create table TagMovies(
    Movie_Id INT,
    Tag_Id INT,
    PRIMARY KEY (Movie_Id, Tag_Id),
    FOREIGN KEY (Movie_Id) REFERENCES Movie(Id),
    FOREIGN KEY (Tag_Id) REFERENCES Tags(Id),
);

(This works, but it is terribly slow) This is the query that I wrote to try and list them: Usually I also filter with top 1 & add a where clause to get a specific set of related data.

SELECT 
    bk.Id,
    rh.Id
FROM
    Movies bk
    CROSS APPLY (
        SELECT TOP 15
           b.Id,
           /* Tags Score */
           (
            SELECT COUNT(*) FROM (
                SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = bk.Id
                INTERSECT
                SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = b.Id
                ) Q1
           )
           as Amount
        FROM 
            Movies b 
        WHERE 
            b.Id <> bk.Id
        ORDER BY Amount DESC
    ) rh

Explanation: Movies have tags and the user can get try to find movies similar to the one that they selected based on other movies that have similar tags.

barat · Accepted Answer

Hmm ... just an idea, but maybe I didnt understand ... This query should return best matched movies by tags for a given movie ID:

SELECT m.id, m.title, GROUP_CONCAT(DISTINCT t.Descr SEPARATOR ', ') as tags, count(*) as matches
FROM stack.Movie m 
LEFT JOIN stack.TagMovies tm ON m.Id = tm.Movie_Id
LEFT JOIN stack.Tags t ON tm.Tag_Id = t.Id
WHERE m.id != 1
AND tm.Tag_Id IN (SELECT Tag_Id FROM stack.TagMovies tm WHERE tm.Movie_Id = 1)
GROUP BY m.id
ORDER BY matches DESC
LIMIT 15;

EDIT: I just realized that it's for M$ SQL ... but maybe something similar can be done...

Tom H · Answer

You should probably decide on a naming convention and stick with it. Are tables singular or plural nouns? I don't want to get into that debate, but pick one or the other.

Without access to your database I don't know how this will perform. It's just off the top of my head. You could also limit this by the M.id value to find the best matches for a single movie, which I think would improve performance by quite a bit.

Also, TOP x should let you get the x closest matches.

SELECT
    M.id,
    M.title,
    SM.id AS similar_movie_id,
    SM.title AS similar_movie_title,
    COUNT(*) AS matched_tags
FROM
    Movie M
INNER JOIN TagsMovie TM1 ON TM1.movie_id = M.movie_id
INNER JOIN TagsMovie TM2 ON
    TM2.tag_id = TM1.tag_id AND
    TM2.movie_id <> TM1.movie_id
INNER JOIN Movie SM ON SM.movie_id = TM2.movie_id
GROUP BY
    M.id,
    M.title,
    SM.id AS similar_movie_id,
    SM.title AS similar_movie_title
ORDER BY
    COUNT(*) DESC

Matching similar entities based on many to many relationship

Tags:

sql

sql-server

tsql

newb

2 Answers

barat

Tom H

Recent Activity

Donate For Us

Matching similar entities based on many to many relationship

Tags:

sql

sql-server

tsql

newb

2 Answers

barat

Tom H

Related questions

Recent Activity

Donate For Us