Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query to search by multiple tags with relevance sorting

Tags:

sql

postgresql

I've got a set of cities that have a many-to-many relationship with a set of tags. The user gives me a collection of tags (which may contain duplicates!), and I need to return a list of matching entries, sorted by relevance.

The Data

Here's some sample data to illustrate the problem:

Cities:

--------------------
| id |    city     |
--------------------
|  1 |  Atlanta    |
|  2 |  Baltimore  |
|  3 |  Cleveland  |
|  4 |  Denver     |
|  5 |  Eugene     |
--------------------

Tags:

------
| id |
------
|  1 |
|  2 |
|  3 |
|  4 |
------

The cities are tagged like this:

Atlanta:   1, 2
Baltimore: 3
Cleveland: 1, 3, 4
Denver:    2, 3
Eugene:    1, 4

...so the CityTags table looks like:

------------------------
|  city_id  |  tag_id  |
------------------------
|     1     |     1    |
|     1     |     2    |
|     2     |     3    |
|     3     |     1    |
|     3     |     3    |
|     3     |     4    |
|     4     |     2    |
|     4     |     3    |
|     5     |     1    |
|     5     |     4    |
------------------------

Example 1

If the user gives me the tag ids: [1, 3, 3, 4], I want to count how many matches I have for each of the tags, and return a relevance-sorted result like:

------------------------
|    city    | matches |
------------------------
|  Cleveland |    4    |
|  Baltimore |    2    |
|  Eugene    |    2    |
|  Atlanta   |    1    |
|  Denver    |    1    |
------------------------

Since Cleveland matched all four tags, it's first, followed by Baltimore and Eugene, which each had two tags match, etc.

Example 2

One more example to make for good measure. For the search [2, 2, 2, 3, 4], we'd get:

------------------------
|    city    | matches |
------------------------
|  Denver    |    4    |
|  Atlanta   |    3    |
|  Cleveland |    2    |
|  Baltimore |    1    |
|  Eugene    |    1    |
------------------------

SQL

If I ignore the repeated tags, then it's trivial:

SELECT name,COUNT(name) AS relevance FROM
  (SELECT name FROM cities,citytags 
    WHERE id=city_id AND tag_id IN (1,3,3,4)) AS matches
  GROUP BY name ORDER BY relevance DESC;

But that's not what I need. I need to respect the duplicates. Can someone suggest how I might accomplish this?

Solution in Postgresql

Aha! A temporary table is was I needed. Postgresql lets me do this with its WITH syntax. Here's the solution:

WITH search(tag) AS (VALUES (1), (3), (3), (4))
SELECT name, COUNT(name) AS relevance FROM cities
INNER JOIN citytags ON cities.id=citytags.city_id
INNER JOIN search ON citytags.tag_id=search.tag
GROUP BY name ORDER BY relevance DESC;

Thank you very much to those that answered.

like image 369
Erik R. Avatar asked Oct 10 '11 15:10

Erik R.


2 Answers

If the user list comes in as a comma-separated list, you could try turning it into a temp table and join on that instead. I don't know the relveant syntax for PosteGRE, so here is the idea in MySql:

create temporary table usertags (tag_id int);
insert usertags values (1),(3),(3),(4);

SELECT name, COUNT(name) AS relevance
FROM cities
JOIN citytags on cities.id = citytags.city_id
JOIN usertags on citytags.tag_id = usertags.tag_id
GROUP BY name ORDER BY relevance DESC;

Converting the comma-separated list to the above code would be as simple as doing a replace all of , to ),( using your server-side language, and then embedding that into a VALUES statement to populate the temp table.

Demo (MySql): http://www.sqlize.com/1qNThhD9tC

like image 163
mellamokb Avatar answered Nov 16 '22 16:11

mellamokb


Stick all the tags into a table and then JOIN instead of including them in an IN list.

CREATE TABLE #input (
  tag_id INT NOT NULL
)
;

INSERT INTO #input
          SELECT 1
UNION ALL SELECT 3
UNION ALL SELECT 3
UNION ALL SELECT 4
;

SELECT
  city.name,
  search.relevance
FROM
  city
INNER JOIN
(
  SELECT
    city_id,
    COUNT(*) AS relevance
  FROM
    citytags
  INNER JOIN
    #input
      ON #input.tag_id = citytags.tag_id
  GROUP BY
    city_id
)
  AS search
    ON search.city_id = city.id
ORDER BY
  search.relevance DESC
;
like image 22
MatBailie Avatar answered Nov 16 '22 16:11

MatBailie