Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting from Table A where it joins to all data in Table B

I have a table of documents, and a table of tags. The documents are tagged with various values.

I am attempting to create a search of these tags, and for the most part it is working. However, I am getting extra results returned when it matches any tag. I only want results where it matches all tags.

I have created this to illustrate the problem http://sqlfiddle.com/#!3/8b98e/11

Tables and Data:

CREATE TABLE Documents
(
 DocId INT,
 DocText VARCHAR(500)
);

CREATE TABLE Tags
(
  TagId INT,
  TagName VARCHAR(50)
);

CREATE TABLE DocumentTags
(
  DocTagId INT,
  DocId INT,
  TagId INT,
  Value VARCHAR(50)
);

INSERT INTO Documents VALUES (1, 'Document 1 Text');
INSERT INTO Documents VALUES (2, 'Document 2 Text');

INSERT INTO Tags VALUES (1, 'Tag Name 1');
INSERT INTO Tags VALUES (2, 'Tag Name 2');

INSERT INTO DocumentTags VALUES (1, 1, 1, 'Value 1');
INSERT INTO DocumentTags VALUES (1, 1, 2, 'Value 2');
INSERT INTO DocumentTags VALUES (1, 2, 1, 'Value 1');

Code:

-- Set up the parameters
DECLARE @TagXml VARCHAR(max)
SET @TagXml = '<tags>
                  <tag>
                    <description>Tag Name 1</description>
                    <value>Value 1</value>
                  </tag>
                  <tag>
                    <description>Tag Name 2</description>
                    <value>Value 2</value>
                  </tag>
                </tags>'

-- Create a table to store the parsed xml in
DECLARE @XmlTagData TABLE 
(
    id varchar(20)
    ,[description] varchar(100)
    ,value varchar(250)
)

-- Populate our XML table
DECLARE @iTag int
EXEC sp_xml_preparedocument @iTag OUTPUT, @TagXml
-- Execute a SELECT statement that uses the OPENXML rowset provider
-- to produce a table from our xml structure and insert it into our temp table
INSERT INTO @XmlTagData (id, [description], value)
SELECT  id, [description], value
FROM    OPENXML (@iTag, '/tags/tag',1)
        WITH (id varchar(20),
                [description] varchar(100) 'description',
                value varchar(250) 'value')

EXECUTE sp_xml_removedocument @iTag

-- Update the XML table Id's to match existsing Tag Id's
UPDATE      @XmlTagData
SET         X.Id = T.TagId
FROM        @XmlTagData X
INNER JOIN  Tags T ON X.[description] = T.TagName

-- Check it looks right
--SELECT * 
--FROM @XmlTagData

-- This is where things do not quite work. I get both doc 1 & 2 back, 
-- but what I want is just document 1.
-- i.e. documents that have both tags with matching values
SELECT DISTINCT D.*
FROM Documents D
INNER JOIN DocumentTags T ON T.DocId = D.DocId
INNER JOIN @XmlTagData X ON X.id = T.TagId AND X.value = T.Value

(Note I am not a DBA, so there may be better ways of doing things. Hopefully I am on the right track, but I am open to other suggestions if my implementation can be improved.)

Can anyone offer any suggestions on how to get only results that have all tags?

Many thanks.

like image 215
Yetiish Avatar asked May 13 '13 16:05

Yetiish


People also ask

Which join is to be used between tables A and B when all rows from A and only matching rows from B are required?

INNER JOIN This type of join returns those records which have matching values in both tables.

Which join is to be used between two tables A and B when the?

SQL cross join A CROSS join returns all rows for all possible combinations of two tables. It generates all the rows from the left table which is then combined with all the rows from the right table. This type of join is also known as a Cartesian product(A*B).

How do I SELECT data from one table to another?

The INSERT INTO SELECT statement copies data from one table and inserts it into another table. The INSERT INTO SELECT statement requires that the data types in source and target tables match. Note: The existing records in the target table are unaffected.

How do I SELECT all data from a table in SQL?

In its most simple form, the SELECT clause has the following SQL syntax for a Microsoft SQL Server database: SELECT * FROM <TableName>; This SQL query will select all columns and all rows from the table.


1 Answers

Use option with [NOT] EXISTS and EXCEPT operators in the last query

SELECT *
FROM Documents D
WHERE NOT EXISTS (
                  SELECT X.ID , X.Value
                  FROM @XmlTagData X 
                  EXCEPT
                  SELECT T.TagId, T.VALUE
                  FROM DocumentTags T
                  WHERE T.DocId = D.DocId
                  )

Demo on SQLFiddle

OR

SELECT *
FROM Documents D
WHERE EXISTS (
              SELECT X.ID , X.Value
              FROM @XmlTagData X 
              EXCEPT
              SELECT T.TagId, T.VALUE
              FROM DocumentTags T
              WHERE T.DocId != D.DocId
              )   

Demo on SQLFiddle

OR

Also you can use a simple solution with XQuery methods: nodes(), value()) and CTE/Subquery.

-- Set up the parameters
DECLARE @TagXml XML
SET @TagXml = '<tags>
                  <tag>
                    <description>Tag Name 1</description>
                    <value>Value 1</value>
                  </tag>
                  <tag>
                    <description>Tag Name 2</description>
                    <value>Value 2</value>
                  </tag>              
                </tags>'               


;WITH cte AS
 (
  SELECT TagValue.value('(./value)[1]', 'nvarchar(100)') AS value,
         TagValue.value('(./description)[1]', 'nvarchar(100)') AS [description]       
  FROM @TagXml.nodes('/tags/tag') AS T(TagValue)
  )
  SELECT *
  FROM Documents D
  WHERE NOT EXISTS (
                    SELECT T.TagId, c.value
                    FROM cte c JOIN Tags T WITH(FORCESEEK) 
                      ON c.[description] = T.TagName
                    EXCEPT
                    SELECT T.TagId, T.VALUE
                    FROM DocumentTags T WITH(FORCESEEK)
                    WHERE T.DocId = D.DocId                          
                    )

Demo on SQLFiddle

OR

-- Set up the parameters
DECLARE @TagXml XML
SET @TagXml = '<tags>
                  <tag>
                    <description>Tag Name 1</description>
                    <value>Value 1</value>
                  </tag>
                  <tag>
                    <description>Tag Name 2</description>
                    <value>Value 2</value>
                  </tag>              
                </tags>'      

  SELECT *
  FROM Documents D
  WHERE NOT EXISTS (
                    SELECT T2.TagId,
                           TagValue.value('(./value)[1]', 'nvarchar(100)') AS value                           
                    FROM @TagXml.nodes('/tags/tag') AS T(TagValue)
                      JOIN Tags T2 WITH(FORCESEEK)
                        ON TagValue.value('(./description)[1]', 'nvarchar(100)') = T2.TagName                                        
                    EXCEPT
                    SELECT T.TagId, T.VALUE
                    FROM DocumentTags T WITH(FORCESEEK)
                    WHERE T.DocId = D.DocId                       
                    )

Demo on SQLFiddle

In order to improving performance(forced operation of index seek on the Tags and DocumentTags tables), use indexes and table hints(FORCESEEK hint added to the query above):

CREATE INDEX x ON Documents(DocId) INCLUDE(DocText)
CREATE INDEX x ON Tags(TagName) INCLUDE(TagId)
CREATE INDEX x ON DocumentTags(DocId) INCLUDE(TagID, VALUE)
like image 101
12 revs Avatar answered Oct 09 '22 17:10

12 revs