Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL multiple table join throwing dupes

Tags:

sql

join

mysql

I'm having some issues with a query throwing up dupes when I do an inner join between some tables.

Essentially, I am joining some tags, and channels(think genre) based on the videoId so a particular video file can have multiple tags and be part of multiple channels.

Unfortunately, multiple matches appear :( My gut feeling is that the query is matching against the videoId for each join and spitting out the double up (as you will see in the linked fiddle, but considering that my grand total of SQL Join experience is from about 3 hours ago I've managed to muddle along OK so far, but could now do with a little bit of expert advice on where to go from here.

Below is the schema and the query I have been wrestling with, and here is an sqlfiddle link,

Schema:

create table videos 
(
   videoId int(1) AUTO_INCREMENT,
   videoUUID char(40),
   contentFolder varchar(50), 
   title varchar(50), 
   caption varchar(50),
   duration varchar(15),
   date TIMESTAMP,
   url varchar(55),
   text varchar(250), 
   PRIMARY KEY (videoId)
);

create table tags 
( 
   tagId int(1) AUTO_INCREMENT,
   tagName varchar(15),
   PRIMARY KEY (tagId)
);

create table channels
(
   channelId int(1) AUTO_INCREMENT,
   channelName varchar(15),
   PRIMARY KEY (channelId)
);

create table videoTags 
(
   videoTagId int(1) AUTO_INCREMENT,
   videoId int(1),
   tagId int(1),
   PRIMARY KEY (videoTagId)
);

create table videoChannels 
(
    videoChannelId int(1) AUTO_INCREMENT,
    videoId int (1),
    channelId int (1),
    PRIMARY KEY (videoChannelId)
 );

//
create trigger tuuid before insert on videos
for each row begin
set new.videoUUID = uuid();
end//

insert into videos 
  (contentFolder, title,caption,duration,url,text)
values ("someDir/","A Movie Title", "Headline for Movie",
    "00:05:11",
    "http://someserver.ip/somedir/test.mp4",
    "Some text as part of the video file description here");

insert into tags (tagName) values
('Flowers'),('Dogs'),('Cats'),('YaMum'),('orlyowl');

insert into channels (channelName) values
('General'), ('NotSoGeneral'), ('Specific'), ('Broad'), ('Narrow'),
('Obsolete'); 

insert into videoTags (videoId,tagId) values
(1,2),(1,5);

insert into videoChannels (videoId,channelId) values
(1,1),(1,4),(1,6);

Query:

select distinct v.*,group_concat(t.tagName)Tags,
group_concat(c.channelName)Channels
from videos as v 

inner join videoTags as vt on v.videoId = vt.videoid

inner join tags as t on t.tagId = vt.tagId

inner join videoChannels as vc on v.videoId = vc.videoId

inner join channels as c on c.channelId = vc.channelId

group by v.videoId;

Am I going about this the right way? Is there just a fundamental wrongness about the way I've structured the query that is doing this? Or is the schema wrong for what I am trying to accomplish.

Any help would be greatly appreciated!

@J Lo Thanks for the answer!

Answer: The DISTINCT Keyword, I tried it in the select statement, but was unsure as to its usage so didn't fully appreciate where it could be used, @J Lo so succinctly pointed out, I can use it within my group_concat, thus giving the query:

select v.*,group_concat(distinct t.tagName)Tags,
group_concat(distinct c.channelName)Channels
from videos as v 

inner join videoTags as vt on v.videoId = vt.videoid

inner join tags as t on t.tagId = vt.tagId

inner join videoChannels as vc on v.videoId = vc.videoId

inner join channels as c on c.channelId = vc.channelId

group by v.videoId;
like image 533
splintex Avatar asked Oct 20 '22 23:10

splintex


1 Answers

Your first couple joins (Video/VideoTags/Tags) yields a table like so:

VideoID = 1 will bring in TagID = 2,5 (Dogs, orlyowl) so you have this

| 1 | Dogs
| 1 | orlyowl

When you join to VideoChannels, it duplicates the above entries for each channel

| 1 | Dogs    | 1
| 1 | orlyowl | 1
| 1 | Dogs    | 4
| 1 | orlyowl | 4
| 1 | Dogs    | 6
| 1 | orlyowl | 6

group_concat has a DISTINCT attribute

select v.*
  , group_concat(distinct t.tagName) Tags
  , group_concat(distinct c.channelName) Channels
from videos as v 
inner join videoTags as vt on v.videoId = vt.videoid
inner join tags as t on t.tagId = vt.tagId
inner join videoChannels as vc on v.videoId = vc.videoId
inner join channels as c on c.channelId = vc.channelId
group by v.videoId;
like image 74
jdl Avatar answered Oct 27 '22 22:10

jdl