I'm currently writing an application that allows one to store images, and then tag these images. I'm using Python and the Peewee ORM (http://charlesleifer.com/docs/peewee/), which is very similar to Django's ORM.
My data model looks like this (simplified):
class Image(BaseModel):
key = CharField()
class Tag(BaseModel):
tag = CharField()
class TagRelationship(BaseModel):
relImage = ForeignKeyField(Image)
relTag = ForeignKeyField(Tag)
Now, I understand conceptually how to query for all Images that have a given set of tags:
SELECT Image.key
FROM Image
INNER JOIN TagRelationship
ON Image.ID = TagRelationship.ImageID
INNER JOIN Tag
ON TagRelationship.TagID = Tag.ID
WHERE Tag.tag
IN ( 'A' , 'B' ) -- list of multiple tags
GROUP BY Image.key
HAVING COUNT(*) = 2 -- where 2 == the number of tags specified, above
However, I also want to be able to do more complex searches. Specifically, I'd like to be able to specify a list of "all tags" - i.e. an image must have all of the specified tags to be returned, along with a list of "any" and a list of "none".
EDIT: I'd like to clarify this a bit. Specifically, the above query is an "all tags"-style query. It returns Images that have all the given tags. I want to be able to specify something like: "Give me all images that have the tags (green, mountain), any one of the tags (background, landscape) but not the tags (digital, drawing)".
Now, ideally, I'd like this to be one SQL query, because pagination then becomes very easy with LIMIT and OFFSET. I've actually got an implementation working whereby I just load everything into Python sets and then use the various intersection operators. What I'm wondering is if there's a method of doing this all at once?
Also, for those interested, I've emailed the author of Peewee about how to represent the above query using Peewee, and he responded with the following solution:
Image.select(['key']).group_by('key').join(TagRelationship).join(Tag).where(tag__in=['tag1', 'tag2']).having('count(*) = 2')
Or, alternatively, a shorter version:
Image.filter(tagrelationship_set__relTag__tag__in=['tag1', 'tag2']).group_by(Image).having('count(*) = 2')
Thanks in advance for your time.
A many-to-many relationship exists when one or more items in one table can have a relationship to one or more items in another table. For example: Your Order table contains orders placed by multiple customers (who are listed in the Customers table), and a customer may place more than one order.
When you have a many-to-many relationship between dimension-type tables, we provide the following guidance: Add each many-to-many related entity as a model table, ensuring it has a unique identifier (ID) column. Add a bridging table to store associated entities. Create one-to-manyone-to-manyIn systems analysis, a one-to-many relationship is a type of cardinality that refers to the relationship between two entities (see also entity–relationship model) A and B in which an element of A may be linked to many elements of B, but a member of B is linked to only one element of A.https://en.wikipedia.org › wiki › One-to-many_(data_model)One-to-many (data model) - Wikipedia relationships between the three tables.
Many to Many(M:N) Relationship Many to many relationships create uncertainty and duplications on data that will eventually result in wrong statements for queries(2).
Some common examples of one-to-many relationships are: A car maker makes many different models, but a particular car model is built only by a single car maker. One customer may make several purchases, but each purchase is made by a single customer.
SELECT Image.key
FROM Image
JOIN TagRelationship
ON Image.ID = TagRelationship.ImageID
JOIN Tag
ON TagRelationship.TagID = Tag.ID
GROUP BY Image.key
HAVING SUM(Tag.tag IN (mandatory tags )) = N /*the number of mandatory tags*/
AND SUM(Tag.tag IN (optional tags )) > 0
AND SUM(Tag.tag IN (prohibited tags)) = 0
UPDATE
A more universally accepted version of the above query (converts the boolean results of the IN predicates into integers using CASE expressions):
SELECT Image.key
FROM Image
JOIN TagRelationship
ON Image.ID = TagRelationship.ImageID
JOIN Tag
ON TagRelationship.TagID = Tag.ID
GROUP BY Image.key
HAVING SUM(CASE WHEN Tag.tag IN (mandatory tags ) THEN 1 ELSE 0 END) = N /*the number of mandatory tags*/
AND SUM(CASE WHEN Tag.tag IN (optional tags ) THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN Tag.tag IN (prohibited tags) THEN 1 ELSE 0 END) = 0
or with COUNTs instead of SUMs:
SELECT Image.key
FROM Image
JOIN TagRelationship
ON Image.ID = TagRelationship.ImageID
JOIN Tag
ON TagRelationship.TagID = Tag.ID
GROUP BY Image.key
HAVING COUNT(CASE WHEN Tag.tag IN (mandatory tags ) THEN 1 END) = N /*the number of mandatory tags*/
AND COUNT(CASE WHEN Tag.tag IN (optional tags ) THEN 1 END) > 0
AND COUNT(CASE WHEN Tag.tag IN (prohibited tags) THEN 1 END) = 0
The top half gets the words that match the mandatory tags. The bottom half does the tags where at least 1 must be present. The bottom query doesn't have a GROUP BY because I want to know if an image appears twice. If it does, it has both background and landscape. The ORDER BY count(*) will make pictures with BOTH background and landscape tags to appear at the top. So green, mountain, background landscape will be the most relevant. Then green, mountain, background OR landscape pictures.
SELECT Image.key, count(*) AS 'relevance'
FROM
(SELECT Image.key
FROM
--good image candidates
(SELECT Image.key
FROM Image
WHERE Image.key NOT IN
--Bad Images
(SELECT DISTINCT(Image.key) --Will reduce size of set, remove duplicates
FROM Image
INNER JOIN TagRelationship
ON Image.ID = TagRelationship.ImageID
INNER JOIN Tag
ON TagRelationship.TagID = Tag.ID
WHERE Tag.tag
IN ('digital', 'drawing' )))
INNER JOIN TagRelationship
ON Image.ID = TagRelationship.ImageID
INNER JOIN Tag
ON TagRelationship.TagID = Tag.ID
WHERE Tag.tag
IN ('green', 'mountain')
GROUP BY Image.key
HAVING COUNT(*) = count('green', 'mountain')
--we need green AND mountain
UNION ALL
--Get all images with one of the following 2 tags
SELECT *
FROM
(SELECT Image.key
FROM Image
INNER JOIN TagRelationship
ON Image.ID = TagRelationship.ImageID
INNER JOIN Tag
ON TagRelationship.TagID = Tag.ID
WHERE Tag.tag
IN ( 'background' , 'landscape' ))
)
GROUP BY Image.key
ORDER BY relevance DESC
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With