Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching for items in a many-to-many relationship

Tags:

python

sql

peewee

I'm currently writing an application that allows one to store images, and then tag these images. I'm using Python and the Peewee ORM (http://charlesleifer.com/docs/peewee/), which is very similar to Django's ORM.

My data model looks like this (simplified):

class Image(BaseModel):
    key = CharField()

class Tag(BaseModel):
    tag = CharField()

class TagRelationship(BaseModel):
    relImage = ForeignKeyField(Image)
    relTag   = ForeignKeyField(Tag)

Now, I understand conceptually how to query for all Images that have a given set of tags:

SELECT Image.key
  FROM Image
INNER JOIN TagRelationship
    ON Image.ID = TagRelationship.ImageID
INNER JOIN Tag
    ON TagRelationship.TagID = Tag.ID
 WHERE Tag.tag
       IN ( 'A' , 'B' )     -- list of multiple tags
GROUP BY Image.key
HAVING COUNT(*) = 2         -- where 2 == the number of tags specified, above

However, I also want to be able to do more complex searches. Specifically, I'd like to be able to specify a list of "all tags" - i.e. an image must have all of the specified tags to be returned, along with a list of "any" and a list of "none".

EDIT: I'd like to clarify this a bit. Specifically, the above query is an "all tags"-style query. It returns Images that have all the given tags. I want to be able to specify something like: "Give me all images that have the tags (green, mountain), any one of the tags (background, landscape) but not the tags (digital, drawing)".

Now, ideally, I'd like this to be one SQL query, because pagination then becomes very easy with LIMIT and OFFSET. I've actually got an implementation working whereby I just load everything into Python sets and then use the various intersection operators. What I'm wondering is if there's a method of doing this all at once?

Also, for those interested, I've emailed the author of Peewee about how to represent the above query using Peewee, and he responded with the following solution:

Image.select(['key']).group_by('key').join(TagRelationship).join(Tag).where(tag__in=['tag1', 'tag2']).having('count(*) = 2')

Or, alternatively, a shorter version:

Image.filter(tagrelationship_set__relTag__tag__in=['tag1', 'tag2']).group_by(Image).having('count(*) = 2')

Thanks in advance for your time.

like image 950
Andrew D Avatar asked Jan 16 '12 06:01

Andrew D


People also ask

What are some examples of a many-to-many relationship?

A many-to-many relationship exists when one or more items in one table can have a relationship to one or more items in another table. For example: Your Order table contains orders placed by multiple customers (who are listed in the Customers table), and a customer may place more than one order.

How do you organize a many-to-many relationship?

When you have a many-to-many relationship between dimension-type tables, we provide the following guidance: Add each many-to-many related entity as a model table, ensuring it has a unique identifier (ID) column. Add a bridging table to store associated entities. Create one-to-manyone-to-manyIn systems analysis, a one-to-many relationship is a type of cardinality that refers to the relationship between two entities (see also entity–relationship model) A and B in which an element of A may be linked to many elements of B, but a member of B is linked to only one element of A.https://en.wikipedia.org › wiki › One-to-many_(data_model)One-to-many (data model) - Wikipedia relationships between the three tables.

What is the problem with many-to-many relationship?

Many to Many(M:N) Relationship Many to many relationships create uncertainty and duplications on data that will eventually result in wrong statements for queries(2).

What is an everyday example of a one-to-many relationship?

Some common examples of one-to-many relationships are: A car maker makes many different models, but a particular car model is built only by a single car maker. One customer may make several purchases, but each purchase is made by a single customer.


2 Answers

SELECT Image.key
  FROM Image
  JOIN TagRelationship
    ON Image.ID = TagRelationship.ImageID
  JOIN Tag
    ON TagRelationship.TagID = Tag.ID
 GROUP BY Image.key
HAVING SUM(Tag.tag IN (mandatory tags )) = N  /*the number of mandatory tags*/
   AND SUM(Tag.tag IN (optional tags  )) > 0
   AND SUM(Tag.tag IN (prohibited tags)) = 0

UPDATE

A more universally accepted version of the above query (converts the boolean results of the IN predicates into integers using CASE expressions):

SELECT Image.key
  FROM Image
  JOIN TagRelationship
    ON Image.ID = TagRelationship.ImageID
  JOIN Tag
    ON TagRelationship.TagID = Tag.ID
 GROUP BY Image.key
HAVING SUM(CASE WHEN Tag.tag IN (mandatory tags ) THEN 1 ELSE 0 END) = N  /*the number of mandatory tags*/
   AND SUM(CASE WHEN Tag.tag IN (optional tags  ) THEN 1 ELSE 0 END) > 0
   AND SUM(CASE WHEN Tag.tag IN (prohibited tags) THEN 1 ELSE 0 END) = 0

or with COUNTs instead of SUMs:

SELECT Image.key
  FROM Image
  JOIN TagRelationship
    ON Image.ID = TagRelationship.ImageID
  JOIN Tag
    ON TagRelationship.TagID = Tag.ID
 GROUP BY Image.key
HAVING COUNT(CASE WHEN Tag.tag IN (mandatory tags ) THEN 1 END) = N  /*the number of mandatory tags*/
   AND COUNT(CASE WHEN Tag.tag IN (optional tags  ) THEN 1 END) > 0
   AND COUNT(CASE WHEN Tag.tag IN (prohibited tags) THEN 1 END) = 0
like image 155
Andriy M Avatar answered Sep 28 '22 11:09

Andriy M


The top half gets the words that match the mandatory tags. The bottom half does the tags where at least 1 must be present. The bottom query doesn't have a GROUP BY because I want to know if an image appears twice. If it does, it has both background and landscape. The ORDER BY count(*) will make pictures with BOTH background and landscape tags to appear at the top. So green, mountain, background landscape will be the most relevant. Then green, mountain, background OR landscape pictures.

SELECT Image.key, count(*) AS 'relevance' 
FROM
     (SELECT Image.key
      FROM
        --good image candidates
        (SELECT Image.key
         FROM Image
         WHERE Image.key NOT IN 
            --Bad Images
            (SELECT DISTINCT(Image.key)   --Will reduce size of set, remove duplicates
             FROM Image
             INNER JOIN TagRelationship
                ON Image.ID = TagRelationship.ImageID
             INNER JOIN Tag
                ON TagRelationship.TagID = Tag.ID
              WHERE Tag.tag
                   IN ('digital', 'drawing' )))
    INNER JOIN TagRelationship
        ON Image.ID = TagRelationship.ImageID
    INNER JOIN Tag
        ON TagRelationship.TagID = Tag.ID
    WHERE Tag.tag
           IN ('green', 'mountain')
    GROUP BY Image.key
    HAVING COUNT(*) = count('green', 'mountain')
    --we need green AND mountain

    UNION ALL

    --Get all images with one of the following 2 tags
    SELECT * 
    FROM
        (SELECT Image.key
         FROM Image
         INNER JOIN TagRelationship
             ON Image.ID = TagRelationship.ImageID
         INNER JOIN Tag
             ON TagRelationship.TagID = Tag.ID
          WHERE Tag.tag
             IN ( 'background' , 'landscape' ))
)
GROUP BY Image.key
ORDER BY relevance DESC
like image 36
JustinDanielson Avatar answered Sep 28 '22 12:09

JustinDanielson