Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looking for advice on a "related videos" query on a tagged video system

Tags:

php

mysql

Well I run a small video website and on the actual video page there is a strip of "related videos" similar to most video sides (e.g. YouTube) and currently all I'm doing is taking one of its tags randomly and finding other videos with the same tag. Not surprisingly this isn't a great method as some tags are very vague and some videos are mis-tagged.

Example of the current query:

SELECT video_name FROM videos INNER JOIN videotags ON videos.id=videotags.video_id INNER JOIN tags ON tags.id=videotags.tag_id WHERE tag_name='x' AND videos.id<>'y' LIMIT 5

Where x is any one of the tags from the current video and y is the ID from the current video. (P.S. I'm using parameterized queries don't worry)

I'm just curious as to how you all would handle this, maybe it would be better to incorporate similar video titles?

Here is how my database tables are setup:

VIDEOS TABLE
------------
video_id [PK,auto_increment] int(11)
video_name varchar(255)

TAGS TABLE
----------
tag_id [PK,auto_increment] int(11)
tag_name varchar(255)

VIDEOTAGS TABLE
---------------
tag_id [PK,FK] int(11)
video_id [PK,FK] int(11)

There's obviously more columns in the videos table but this just illustrates the simple many-to-many relationship with auto-incrementing primary keys on both sides

The site is built on PHP with a MySQL database, but that really doesn't matter :)

EDIT: There's been some talk of going down an organic route so I figure I'd post my other two tables that are semi-related that are to do with video views and video ratings. Now note I don't have any intention of adding more columns specifically to the video views table because of privacy issues (yes I know I store IPs in the rating table)

VIDEOVIEWS TABLE
----------------
video_id [FK] int(11)
view_time datetime

VIDEORATINGS TABLE
------------------
video_id [PK,FK] int(11)
ip_address [PK] varchar(15)
rating int(1)
rate_time datetime
like image 814
Andrew G. Johnson Avatar asked Dec 20 '08 19:12

Andrew G. Johnson


2 Answers

This query should return the id's of videos (v2) that have tags in common with your given video (v1), in descending order of the number of tags in common.

SELECT v2.video_id
FROM VideoTags AS v1
  JOIN VideoTags AS v2
  USING (tag_id)
WHERE v1.video_id = ?
  AND v1.video_id <> v2.video_id
GROUP BY v2.video_id 
ORDER BY COUNT(*) DESC;
like image 63
Bill Karwin Avatar answered Oct 27 '22 11:10

Bill Karwin


Very interesting question.

This is just thinking out loud, but some options I can think of are:

1) Use all the tags - for example imagine queries for the list of videos that have each tag that this video does. Produce a list of videos ordered by the count of the number of those lists they appear on, i.e. the count of how many tags they have in common with this video. Ones with more tags in common are presumably 'more related'.

(I'm not suggesting you do multiple queries in reality, just trying to explain what I have in mind...someone with better SQL-fu than me can probably come up with a single query that does this. Perhaps you can additionally order by popularity or other information that you may have).

2) Try to come with an algorithm that lets the related videos naturally emerge, a la amazon 'people who bought this also bought this'. For example if you track who viewed what, you may be able to design a query that produces such a list.

like image 1
frankodwyer Avatar answered Oct 27 '22 10:10

frankodwyer