I've got a set of cities that have a many-to-many relationship with a set of tags. The user gives me a collection of tags (which may contain duplicates!), and I need to return a list of matching entries, sorted by relevance.
Here's some sample data to illustrate the problem:
Cities:
--------------------
| id | city |
--------------------
| 1 | Atlanta |
| 2 | Baltimore |
| 3 | Cleveland |
| 4 | Denver |
| 5 | Eugene |
--------------------
Tags:
------
| id |
------
| 1 |
| 2 |
| 3 |
| 4 |
------
The cities are tagged like this:
Atlanta: 1, 2
Baltimore: 3
Cleveland: 1, 3, 4
Denver: 2, 3
Eugene: 1, 4
...so the CityTags table looks like:
------------------------
| city_id | tag_id |
------------------------
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 3 |
| 3 | 4 |
| 4 | 2 |
| 4 | 3 |
| 5 | 1 |
| 5 | 4 |
------------------------
If the user gives me the tag ids: [1, 3, 3, 4], I want to count how many matches I have for each of the tags, and return a relevance-sorted result like:
------------------------
| city | matches |
------------------------
| Cleveland | 4 |
| Baltimore | 2 |
| Eugene | 2 |
| Atlanta | 1 |
| Denver | 1 |
------------------------
Since Cleveland matched all four tags, it's first, followed by Baltimore and Eugene, which each had two tags match, etc.
One more example to make for good measure. For the search [2, 2, 2, 3, 4], we'd get:
------------------------
| city | matches |
------------------------
| Denver | 4 |
| Atlanta | 3 |
| Cleveland | 2 |
| Baltimore | 1 |
| Eugene | 1 |
------------------------
If I ignore the repeated tags, then it's trivial:
SELECT name,COUNT(name) AS relevance FROM
(SELECT name FROM cities,citytags
WHERE id=city_id AND tag_id IN (1,3,3,4)) AS matches
GROUP BY name ORDER BY relevance DESC;
But that's not what I need. I need to respect the duplicates. Can someone suggest how I might accomplish this?
Aha! A temporary table is was I needed. Postgresql lets me do this with its WITH syntax. Here's the solution:
WITH search(tag) AS (VALUES (1), (3), (3), (4))
SELECT name, COUNT(name) AS relevance FROM cities
INNER JOIN citytags ON cities.id=citytags.city_id
INNER JOIN search ON citytags.tag_id=search.tag
GROUP BY name ORDER BY relevance DESC;
Thank you very much to those that answered.
If the user list comes in as a comma-separated list, you could try turning it into a temp table and join on that instead. I don't know the relveant syntax for PosteGRE, so here is the idea in MySql:
create temporary table usertags (tag_id int);
insert usertags values (1),(3),(3),(4);
SELECT name, COUNT(name) AS relevance
FROM cities
JOIN citytags on cities.id = citytags.city_id
JOIN usertags on citytags.tag_id = usertags.tag_id
GROUP BY name ORDER BY relevance DESC;
Converting the comma-separated list to the above code would be as simple as doing a replace all of ,
to ),(
using your server-side language, and then embedding that into a VALUES
statement to populate the temp table.
Demo (MySql): http://www.sqlize.com/1qNThhD9tC
Stick all the tags into a table and then JOIN instead of including them in an IN list.
CREATE TABLE #input (
tag_id INT NOT NULL
)
;
INSERT INTO #input
SELECT 1
UNION ALL SELECT 3
UNION ALL SELECT 3
UNION ALL SELECT 4
;
SELECT
city.name,
search.relevance
FROM
city
INNER JOIN
(
SELECT
city_id,
COUNT(*) AS relevance
FROM
citytags
INNER JOIN
#input
ON #input.tag_id = citytags.tag_id
GROUP BY
city_id
)
AS search
ON search.city_id = city.id
ORDER BY
search.relevance DESC
;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With