I have a following table structure. <code>USERS</code> <img src="https://i.stack.imgur.com/UEi5N.png" alt="USERS data"> <code>PROPERTY_VALUE</code> <img src="https://i.stack.imgur.com/V6rSt.png" alt="PROPERTY_VALUE data"> <code>PROPERTY_NAME</code> <img src="https://i.stack.imgur.com/lGqm0.png" alt="PROPERTY_NAME data"> <code>USER_PROPERTY_MAP</code> <img src="https://i.stack.imgur.com/YGmUZ.png" alt="USER_PROPERTY_MAP data"> I am trying to retrieve user/s from the <code>users</code> table who have matching properties in <code>property_value</code> table. A single user can have multiple properties. The example data here has 2 properties for user '1', but there can be more than 2. I want to use all those user properties in the <code>WHERE</code> clause. This query works if user has a single property but it fails for more than 1 properties: <pre class="prettyprint"><code>SELECT * FROM users u INNER JOIN user_property_map upm ON u.id = upm.user_id INNER JOIN property_value pv ON upm.property_value_id = pv.id INNER JOIN property_name pn ON pv.property_name_id = pn.id WHERE (pn.id = 1 AND pv.id IN (SELECT id FROM property_value WHERE value like '101') AND pn.id = 2 AND pv.id IN (SELECT id FROM property_value WHERE value like '102')) and u.user_name = 'user1' and u.city = 'city1' </code></pre> I understand since the query has <code>pn.id = 1 AND pn.id = 2</code> it will always fail because <code>pn.id</code> can be either 1 or 2 but not both at the same time. So how can I re-write it to make it work for n number of properties? In above example data there is only one user with <code>id = 1</code> that has both matching properties used in the <code>WHERE</code> clause. The query should return a single record with all columns of the <code>USERS</code> table. <h3>To clarify my requirements</h3> I am working on an application that has a users list page on the UI listing all users in the system. This list has information like user id, user name, city etc. - all columns of the in <code>USERS</code> table. Users can have properties as detailed in the database model above. The users list page also provides functionality to search users based on these properties. When searching for users with 2 properties, 'property1' and 'property2', the page should fetch and display only matching rows. Based on the test data above, only user '1' fits the bill. A user with 4 properties including 'property1' and 'property2' qualifies. But a user with only one property 'property1' would be excluded due to the missing 'property2'.

This is a case of relational-division. I added the tag. <h3>Indexes</h3> Assuming a PK or UNIQUE constraint on <code>USER_PROPERTY_MAP(property_value_id, user_id)</code> - columns in this order to make my queries fast. Related: <ul> <li>Is a composite index also good for queries on the first field?</li> </ul> You should also have an index on <code>PROPERTY_VALUE(value, property_name_id, id)</code>. Again, columns in this order. Add the the last column <code>id</code> only if you get index-only scans out of it. <h3>For a given number of properties</h3> There are many ways to solve it. This should be one of the simplest and fastest for exactly two properties: <pre class="prettyprint"><code>SELECT u.* FROM users u JOIN user_property_map up1 ON up1.user_id = u.id JOIN user_property_map up2 USING (user_id) WHERE up1.property_value_id = (SELECT id FROM property_value WHERE property_name_id = 1 AND value = '101') AND up2.property_value_id = (SELECT id FROM property_value WHERE property_name_id = 2 AND value = '102') -- AND u.user_name = 'user1' -- more filters? -- AND u.city = 'city1' </code></pre> Not visiting table <code>PROPERTY_NAME</code>, since you seem to have resolved property names to IDs already, according to your example query. Else you could add a join to <code>PROPERTY_NAME</code> in each subquery. We have assembled an arsenal of techniques under this related question: <ul> <li>How to filter SQL results in a has-many-through relation</li> </ul> <h3>For an unknown number of properties</h3> @Mike and @Valera have very useful queries in their respective answers. To make this even more dynamic: <pre class="prettyprint"><code>WITH input(property_name_id, value) AS ( VALUES -- provide n rows with input parameters here (1, '101') , (2, '102') -- more? ) SELECT * FROM users u JOIN ( SELECT up.user_id AS id FROM input JOIN property_value pv USING (property_name_id, value) JOIN user_property_map up ON up.property_value_id = pv.id GROUP BY 1 HAVING count(*) = (SELECT count(*) FROM input) ) sub USING (id); </code></pre> Only add / remove rows from the <code>VALUES</code> expression. Or remove the <code>WITH</code> clause and the <code>JOIN</code> for no property filters at all. The problem with this class of queries (counting all partial matches) is performance. My first query is less dynamic, but typically considerably faster. (Just test with <code>EXPLAIN ANALYZE</code>.) Especially for bigger tables and a growing number of properties. <h3>Best of both worlds?</h3> This solution with a recursive CTE should be a good compromise: fast and dynamic: <pre class="prettyprint"><code>WITH RECURSIVE input AS ( SELECT count(*) OVER () AS ct , row_number() OVER () AS rn , * FROM ( VALUES -- provide n rows with input parameters here (1, '101') , (2, '102') -- more? ) i (property_name_id, value) ) , rcte AS ( SELECT i.ct, i.rn, up.user_id AS id FROM input i JOIN property_value pv USING (property_name_id, value) JOIN user_property_map up ON up.property_value_id = pv.id WHERE i.rn = 1 UNION ALL SELECT i.ct, i.rn, up.user_id FROM rcte r JOIN input i ON i.rn = r.rn + 1 JOIN property_value pv USING (property_name_id, value) JOIN user_property_map up ON up.property_value_id = pv.id AND up.user_id = r.id ) SELECT u.* FROM rcte r JOIN users u USING (id) WHERE r.ct = r.rn; -- has all matches </code></pre> dbfiddle here The manual about recursive CTEs. The added complexity does not pay for small tables where the additional overhead outweighs any benefit or the difference is negligible to begin with. But it scales much better and is increasingly superior to "counting" techniques with growing tables and a growing number of property filters. Counting techniques have to visit all rows in <code>user_property_map</code> for all given property filters, while this query (as well as the 1st query) can eliminate irrelevant users early. <h3>Optimizing performance</h3> With current table statistics (reasonable settings, <code>autovacuum</code> running), Postgres has knowledge about "most common values" in each column and will reorder joins in the 1st query to evaluate the most selective property filters first (or at least not the least selective ones). Up to a certain limit: <code>join_collapse_limit</code>. Related: <ul> <li>Postgresql join_collapse_limit and time for query planning</li> <li>Why does a slight change in the search term slow down the query so much?</li> </ul> This "deus-ex-machina" intervention is not possible with the 3rd query (recursive CTE). To help performance (possibly a lot) you have to place more selective filters first yourself. But even with the worst-case ordering it will still outperform counting queries. Related: <ul> <li>Check statistics targets in PostgreSQL</li> </ul> Much more gory details: <ul> <li>PostgreSQL partial index unused when created on a table with existing data</li> </ul> More explanation in the manual: <ul> <li>Statistics Used by the Planner</li> </ul>

Using same column multiple times in WHERE clause

Tags:

sql

postgresql

relational-division

I have a following table structure.

USERS

USERS data

PROPERTY_VALUE

PROPERTY_VALUE data

PROPERTY_NAME

PROPERTY_NAME data

USER_PROPERTY_MAP

USER_PROPERTY_MAP data

I am trying to retrieve user/s from the users table who have matching properties in property_value table.

A single user can have multiple properties. The example data here has 2 properties for user '1', but there can be more than 2. I want to use all those user properties in the WHERE clause.

This query works if user has a single property but it fails for more than 1 properties:

SELECT * FROM users u
INNER JOIN user_property_map upm ON u.id = upm.user_id
INNER JOIN property_value pv ON upm.property_value_id = pv.id
INNER JOIN property_name pn ON pv.property_name_id = pn.id
WHERE (pn.id = 1 AND pv.id IN (SELECT id FROM property_value WHERE value like '101')
AND pn.id = 2 AND pv.id IN (SELECT id FROM property_value WHERE value like '102')) and u.user_name = 'user1' and u.city = 'city1'

I understand since the query has pn.id = 1 AND pn.id = 2 it will always fail because pn.id can be either 1 or 2 but not both at the same time. So how can I re-write it to make it work for n number of properties?

In above example data there is only one user with id = 1 that has both matching properties used in the WHERE clause. The query should return a single record with all columns of the USERS table.

To clarify my requirements

I am working on an application that has a users list page on the UI listing all users in the system. This list has information like user id, user name, city etc. - all columns of the in USERS table. Users can have properties as detailed in the database model above.

The users list page also provides functionality to search users based on these properties. When searching for users with 2 properties, 'property1' and 'property2', the page should fetch and display only matching rows. Based on the test data above, only user '1' fits the bill.

A user with 4 properties including 'property1' and 'property2' qualifies. But a user with only one property 'property1' would be excluded due to the missing 'property2'.

636

asked Nov 17 '17 13:11

ivish

2 Answers

This is a case of relational-division. I added the tag.

Indexes

Assuming a PK or UNIQUE constraint on USER_PROPERTY_MAP(property_value_id, user_id) - columns in this order to make my queries fast. Related:

Is a composite index also good for queries on the first field?

You should also have an index on PROPERTY_VALUE(value, property_name_id, id). Again, columns in this order. Add the the last column id only if you get index-only scans out of it.

For a given number of properties

There are many ways to solve it. This should be one of the simplest and fastest for exactly two properties:

SELECT u.*
FROM   users             u
JOIN   user_property_map up1 ON up1.user_id = u.id
JOIN   user_property_map up2 USING (user_id)
WHERE  up1.property_value_id =
      (SELECT id FROM property_value WHERE property_name_id = 1 AND value = '101')
AND    up2.property_value_id =
      (SELECT id FROM property_value WHERE property_name_id = 2 AND value = '102')
-- AND    u.user_name = 'user1'  -- more filters?
-- AND    u.city = 'city1'

Not visiting table PROPERTY_NAME, since you seem to have resolved property names to IDs already, according to your example query. Else you could add a join to PROPERTY_NAME in each subquery.

We have assembled an arsenal of techniques under this related question:

How to filter SQL results in a has-many-through relation

For an unknown number of properties

@Mike and @Valera have very useful queries in their respective answers. To make this even more dynamic:

WITH input(property_name_id, value) AS (
      VALUES  -- provide n rows with input parameters here
        (1, '101')
      , (2, '102')
      -- more?
      ) 
SELECT *
FROM   users u
JOIN  (
   SELECT up.user_id AS id
   FROM   input
   JOIN   property_value    pv USING (property_name_id, value)
   JOIN   user_property_map up ON up.property_value_id = pv.id
   GROUP  BY 1
   HAVING count(*) = (SELECT count(*) FROM input)
   ) sub USING (id);

Only add / remove rows from the VALUES expression. Or remove the WITH clause and the JOIN for no property filters at all.

The problem with this class of queries (counting all partial matches) is performance. My first query is less dynamic, but typically considerably faster. (Just test with EXPLAIN ANALYZE.) Especially for bigger tables and a growing number of properties.

Best of both worlds?

This solution with a recursive CTE should be a good compromise: fast and dynamic:

WITH RECURSIVE input AS (
   SELECT count(*)     OVER () AS ct
        , row_number() OVER () AS rn
        , *
   FROM  (
      VALUES  -- provide n rows with input parameters here
        (1, '101')
      , (2, '102')
      -- more?
      ) i (property_name_id, value)
   )
 , rcte AS (
   SELECT i.ct, i.rn, up.user_id AS id
   FROM   input             i
   JOIN   property_value    pv USING (property_name_id, value)
   JOIN   user_property_map up ON up.property_value_id = pv.id
   WHERE  i.rn = 1

   UNION ALL
   SELECT i.ct, i.rn, up.user_id
   FROM   rcte              r
   JOIN   input             i ON i.rn = r.rn + 1
   JOIN   property_value    pv USING (property_name_id, value)
   JOIN   user_property_map up ON up.property_value_id = pv.id
                              AND up.user_id = r.id
   )
SELECT u.*
FROM   rcte  r
JOIN   users u USING (id)
WHERE  r.ct = r.rn;          -- has all matches

dbfiddle here

The manual about recursive CTEs.

The added complexity does not pay for small tables where the additional overhead outweighs any benefit or the difference is negligible to begin with. But it scales much better and is increasingly superior to "counting" techniques with growing tables and a growing number of property filters.

Counting techniques have to visit all rows in user_property_map for all given property filters, while this query (as well as the 1st query) can eliminate irrelevant users early.

Optimizing performance

With current table statistics (reasonable settings, autovacuum running), Postgres has knowledge about "most common values" in each column and will reorder joins in the 1st query to evaluate the most selective property filters first (or at least not the least selective ones). Up to a certain limit: join_collapse_limit. Related:

Postgresql join_collapse_limit and time for query planning
Why does a slight change in the search term slow down the query so much?

This "deus-ex-machina" intervention is not possible with the 3rd query (recursive CTE). To help performance (possibly a lot) you have to place more selective filters first yourself. But even with the worst-case ordering it will still outperform counting queries.

Check statistics targets in PostgreSQL

Much more gory details:

PostgreSQL partial index unused when created on a table with existing data

More explanation in the manual:

Statistics Used by the Planner

189

answered Sep 19 '22 09:09

Erwin Brandstetter

SELECT * FROM users u
INNER JOIN user_property_map upm ON u.id = upm.user_id
INNER JOIN property_value pv ON upm.property_value_id = pv.id
INNER JOIN property_name pn ON pv.property_name_id = pn.id
WHERE (pn.id = 1 AND pv.id IN (SELECT id FROM property_value WHERE value 
like '101') )
OR ( pn.id = 2 AND pv.id IN (SELECT id FROM property_value WHERE value like 
'102'))

OR (...)
OR (...)

You can't do AND because there is no such a case where id is 1 and 2 for the SAME ROW, you specify the where condition for each row!

If you run a simple test, like

SELECT * FROM users where id=1 and id=2

you will get 0 results. To achieve that use

 id in (1,2)

 id=1 or id=2

That query can be optimised more but this is a good start I hope.

answered Sep 17 '22 09:09

MiloBellano

Related questions
                            
                                How to select rows which start with digit in PostgreSQL?
                            
                                Specify data type with column alias in SQL Server 2008
                            
                                Is Change Data Capture Performance Loss Restricted to CDC Enabled Tables?
                            
                                Regular Expressions in DB2 SQL
                            
                                Retrieve id of record just inserted into a Java DB (Derby) database
                            
                                SQL Android Query formulation with multiple conditions
                            
                                Rename a select column in sql
                            
                                postgreSQL uuid generation
                            
                                Unicode- VARCHAR and NVARCHAR
                            
                                What is the difference between a file-based database and a server-based database?
                            
                                IsDate Function in SQL evaluates invalid dates as valid
                            
                                Coalesce vs Case [closed]
                            
                                Oracle get substring before a space
                            
                                Are brackets in the WHERE clause standard sql
                            
                                Convert nvarchar to int in order to join SQL tables in a view
                            
                                insert string which includes quotes in oracle
                            
                                Inserting PostgreSQL arrays with Clojure
                            
                                LEFT OUTER JOIN with subquery syntax
                            
                                Postgres - WHERE clause on whether values contain a certain string
                            
                                SQL cursor fetch status meaning

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With