I have a table of documents, and a table of tags. The documents are tagged with various values.
I am attempting to create a search of these tags, and for the most part it is working. However, I am getting extra results returned when it matches any tag. I only want results where it matches all tags.
I have created this to illustrate the problem http://sqlfiddle.com/#!3/8b98e/11
Tables and Data:
CREATE TABLE Documents
(
DocId INT,
DocText VARCHAR(500)
);
CREATE TABLE Tags
(
TagId INT,
TagName VARCHAR(50)
);
CREATE TABLE DocumentTags
(
DocTagId INT,
DocId INT,
TagId INT,
Value VARCHAR(50)
);
INSERT INTO Documents VALUES (1, 'Document 1 Text');
INSERT INTO Documents VALUES (2, 'Document 2 Text');
INSERT INTO Tags VALUES (1, 'Tag Name 1');
INSERT INTO Tags VALUES (2, 'Tag Name 2');
INSERT INTO DocumentTags VALUES (1, 1, 1, 'Value 1');
INSERT INTO DocumentTags VALUES (1, 1, 2, 'Value 2');
INSERT INTO DocumentTags VALUES (1, 2, 1, 'Value 1');
Code:
-- Set up the parameters
DECLARE @TagXml VARCHAR(max)
SET @TagXml = '<tags>
<tag>
<description>Tag Name 1</description>
<value>Value 1</value>
</tag>
<tag>
<description>Tag Name 2</description>
<value>Value 2</value>
</tag>
</tags>'
-- Create a table to store the parsed xml in
DECLARE @XmlTagData TABLE
(
id varchar(20)
,[description] varchar(100)
,value varchar(250)
)
-- Populate our XML table
DECLARE @iTag int
EXEC sp_xml_preparedocument @iTag OUTPUT, @TagXml
-- Execute a SELECT statement that uses the OPENXML rowset provider
-- to produce a table from our xml structure and insert it into our temp table
INSERT INTO @XmlTagData (id, [description], value)
SELECT id, [description], value
FROM OPENXML (@iTag, '/tags/tag',1)
WITH (id varchar(20),
[description] varchar(100) 'description',
value varchar(250) 'value')
EXECUTE sp_xml_removedocument @iTag
-- Update the XML table Id's to match existsing Tag Id's
UPDATE @XmlTagData
SET X.Id = T.TagId
FROM @XmlTagData X
INNER JOIN Tags T ON X.[description] = T.TagName
-- Check it looks right
--SELECT *
--FROM @XmlTagData
-- This is where things do not quite work. I get both doc 1 & 2 back,
-- but what I want is just document 1.
-- i.e. documents that have both tags with matching values
SELECT DISTINCT D.*
FROM Documents D
INNER JOIN DocumentTags T ON T.DocId = D.DocId
INNER JOIN @XmlTagData X ON X.id = T.TagId AND X.value = T.Value
(Note I am not a DBA, so there may be better ways of doing things. Hopefully I am on the right track, but I am open to other suggestions if my implementation can be improved.)
Can anyone offer any suggestions on how to get only results that have all tags?
Many thanks.
INNER JOIN This type of join returns those records which have matching values in both tables.
SQL cross join A CROSS join returns all rows for all possible combinations of two tables. It generates all the rows from the left table which is then combined with all the rows from the right table. This type of join is also known as a Cartesian product(A*B).
The INSERT INTO SELECT statement copies data from one table and inserts it into another table. The INSERT INTO SELECT statement requires that the data types in source and target tables match. Note: The existing records in the target table are unaffected.
In its most simple form, the SELECT clause has the following SQL syntax for a Microsoft SQL Server database: SELECT * FROM <TableName>; This SQL query will select all columns and all rows from the table.
Use option with [NOT] EXISTS and EXCEPT operators in the last query
SELECT *
FROM Documents D
WHERE NOT EXISTS (
SELECT X.ID , X.Value
FROM @XmlTagData X
EXCEPT
SELECT T.TagId, T.VALUE
FROM DocumentTags T
WHERE T.DocId = D.DocId
)
Demo on SQLFiddle
OR
SELECT *
FROM Documents D
WHERE EXISTS (
SELECT X.ID , X.Value
FROM @XmlTagData X
EXCEPT
SELECT T.TagId, T.VALUE
FROM DocumentTags T
WHERE T.DocId != D.DocId
)
Demo on SQLFiddle
OR
Also you can use a simple solution with XQuery methods: nodes(), value()) and CTE/Subquery.
-- Set up the parameters
DECLARE @TagXml XML
SET @TagXml = '<tags>
<tag>
<description>Tag Name 1</description>
<value>Value 1</value>
</tag>
<tag>
<description>Tag Name 2</description>
<value>Value 2</value>
</tag>
</tags>'
;WITH cte AS
(
SELECT TagValue.value('(./value)[1]', 'nvarchar(100)') AS value,
TagValue.value('(./description)[1]', 'nvarchar(100)') AS [description]
FROM @TagXml.nodes('/tags/tag') AS T(TagValue)
)
SELECT *
FROM Documents D
WHERE NOT EXISTS (
SELECT T.TagId, c.value
FROM cte c JOIN Tags T WITH(FORCESEEK)
ON c.[description] = T.TagName
EXCEPT
SELECT T.TagId, T.VALUE
FROM DocumentTags T WITH(FORCESEEK)
WHERE T.DocId = D.DocId
)
Demo on SQLFiddle
OR
-- Set up the parameters
DECLARE @TagXml XML
SET @TagXml = '<tags>
<tag>
<description>Tag Name 1</description>
<value>Value 1</value>
</tag>
<tag>
<description>Tag Name 2</description>
<value>Value 2</value>
</tag>
</tags>'
SELECT *
FROM Documents D
WHERE NOT EXISTS (
SELECT T2.TagId,
TagValue.value('(./value)[1]', 'nvarchar(100)') AS value
FROM @TagXml.nodes('/tags/tag') AS T(TagValue)
JOIN Tags T2 WITH(FORCESEEK)
ON TagValue.value('(./description)[1]', 'nvarchar(100)') = T2.TagName
EXCEPT
SELECT T.TagId, T.VALUE
FROM DocumentTags T WITH(FORCESEEK)
WHERE T.DocId = D.DocId
)
Demo on SQLFiddle
In order to improving performance(forced operation of index seek on the Tags and DocumentTags tables), use indexes and table hints(FORCESEEK hint added to the query above):
CREATE INDEX x ON Documents(DocId) INCLUDE(DocText)
CREATE INDEX x ON Tags(TagName) INCLUDE(TagId)
CREATE INDEX x ON DocumentTags(DocId) INCLUDE(TagID, VALUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With