Building a search engine for an apartment site and I'm not sure how to index the <code>apartments</code> table. Example of queries: <ul> <li><code>...WHERE city_id = 1 AND size > 500 AND rooms = 2</code></li> <li><code>...WHERE area_id = 2 AND ad_type = 'agent' AND price BETWEEN 10000 AND 14000</code></li> <li><code>...WHERE area_id = 2 OR area_id = 4 AND published_at > '2016-01-01' AND ad_type = 1</code></li> </ul> As you can see, the columns can vary a lot, and the number of columns in the WHERE clause can be up to 10, or possibly even more. <ul> <li>Should I index all of them?</li> <li>Only the most common ones?</li> </ul>

You have to figure out what <code>WHERE</code> clauses you are going to use with this query, how often each will occur and and how selective each condition will be. <ul> <li>Don't index for queries that occur seldom unless you have to.</li> <li>Use multicolumn indexes, starting with those columns that will occur in an <code>=</code> comparison.</li> <li>Concerning the order of columns in a multicolumn index, start with those columns that will be used in a query by themselves (an index can be used for a query with only some of its columns, provided they are at the beginning of the index).</li> <li>You might omit columns with low selectivity, like <code>gender</code>.</li> </ul> For example, with your above queries, if they are all frequent and all columns are selective, these indexes would be good: <pre class="prettyprint"><code>... ON apartments (city_id, rooms, size) ... ON apartments (area_id, ad_type, price) ... ON apartments (area_id, ad_type, published_at) </code></pre> These indexes could also be used for <code>WHERE</code> clauses with only <code>area_id</code> or <code>city_id</code> in them. It is bad to have too many indexes. If the above method would lead to too many indexes, e.g. because the user can pick arbitrary columns for the <code>WHERE</code> clause, it is better to index individual columns or occasionally pairs of columns that regularly go together. That way PostgreSQL can pick a bitmap index scan to combine several indexes for one query. That is less efficient than a regular index scan, but usually better than a sequential scan.

Postgres 9.6 provides a new extension to address your conundrum precisely: <h3>bloom index</h3> From the same authors who brought trigram indexes or text search to Postgres (among other things). A single bloom index on all involved columns works well for any combination of them in the <code>WHERE</code> clause - even if not as well as a separate btree indexes on each column. But a single index is much smaller and cheaper to maintain than many indexes. You'll have to weigh costs and benefits. A bloom index excels for many index columns that can be combined in many ways. I might combine a bloom index as "catch-all" with some tailored multicolumn btree indexes to optimize the most common combinations (along the guidelines provided by @Laurenz) and some single column indexes on the most frequently queried columns. Some more explanation: <ul> <li>Is a composite index also good for queries on the first field?</li> </ul> The feature is new and there are some important limitations. Quoting the manual: <blockquote> <ul> <li>Only operator classes for <code>int4</code> and <code>text</code> are included with the module.</li> <li>Only the <code>=</code> operator is supported for search. But it is possible to add support for arrays with union and intersection operations in the future.</li> </ul> </blockquote> So not for <code>published_at</code>, which looks like a <code>date</code> (but you could still extract an EPOCH and index that) and only for equality predicates. After creating the extension (once per DB): <pre class="prettyprint"><code>CREATE EXTENSION bloom; </code></pre> Create a bloom index: <pre class="prettyprint"><code>CREATE INDEX tbl_bloomidx ON tbl USING bloom (area_id, city_id, size, rooms, ad_type); -- many more columns? </code></pre> And some others: <pre class="prettyprint"><code>CREATE INDEX tbl_published_at ON tbl (published_at); CREATE INDEX tbl_published_at ON tbl (price); -- some popular combinations... </code></pre> The manual has some examples comparing bloom, multicolumn and single-column btree indexes. Very insightful.

What to index on queries with lots of columns in the WHERE clause

2 Answers

You have to figure out what WHERE clauses you are going to use with this query, how often each will occur and and how selective each condition will be.

Don't index for queries that occur seldom unless you have to.
Use multicolumn indexes, starting with those columns that will occur in an = comparison.
Concerning the order of columns in a multicolumn index, start with those columns that will be used in a query by themselves (an index can be used for a query with only some of its columns, provided they are at the beginning of the index).
You might omit columns with low selectivity, like gender.

For example, with your above queries, if they are all frequent and all columns are selective, these indexes would be good:

... ON apartments (city_id, rooms, size)

... ON apartments (area_id, ad_type, price)

... ON apartments (area_id, ad_type, published_at)

These indexes could also be used for WHERE clauses with only area_id or city_id in them.

It is bad to have too many indexes.

If the above method would lead to too many indexes, e.g. because the user can pick arbitrary columns for the WHERE clause, it is better to index individual columns or occasionally pairs of columns that regularly go together.

That way PostgreSQL can pick a bitmap index scan to combine several indexes for one query. That is less efficient than a regular index scan, but usually better than a sequential scan.

104

answered Oct 05 '22 19:10

Laurenz Albe

Postgres 9.6 provides a new extension to address your conundrum precisely:

bloom index

From the same authors who brought trigram indexes or text search to Postgres (among other things).

A single bloom index on all involved columns works well for any combination of them in the WHERE clause - even if not as well as a separate btree indexes on each column. But a single index is much smaller and cheaper to maintain than many indexes. You'll have to weigh costs and benefits.

A bloom index excels for many index columns that can be combined in many ways.

I might combine a bloom index as "catch-all" with some tailored multicolumn btree indexes to optimize the most common combinations (along the guidelines provided by @Laurenz) and some single column indexes on the most frequently queried columns.

Some more explanation:

Is a composite index also good for queries on the first field?

The feature is new and there are some important limitations. Quoting the manual:

Only operator classes for int4 and text are included with the module.

Only the = operator is supported for search. But it is possible to add support for arrays with union and intersection operations in the future.

So not for published_at, which looks like a date (but you could still extract an EPOCH and index that) and only for equality predicates.

After creating the extension (once per DB):

CREATE EXTENSION bloom;

Create a bloom index:

CREATE INDEX tbl_bloomidx
ON tbl USING bloom (area_id, city_id, size, rooms, ad_type);  -- many more columns?

And some others:

CREATE INDEX tbl_published_at ON tbl (published_at);
CREATE INDEX tbl_published_at ON tbl (price);
-- some popular combinations...

The manual has some examples comparing bloom, multicolumn and single-column btree indexes. Very insightful.

answered Oct 05 '22 18:10

Erwin Brandstetter

Related questions
                            
                                Create a schema with the name passed by variable
                            
                                Triggers vs. JPA @PrePersist for creation and update timestamps pros and cons
                            
                                pg gem Trace/BPT trap: 5 error on MAC OS X lion
                            
                                postgresql hstore key/value vs traditional SQL performance
                            
                                'P 0' < 'P! ' in python and postgresql
                            
                                "Repeatable read" vs Optimistic [closed]
                            
                                PostgreSQL and word games
                            
                                Using UNNEST with a JOIN
                            
                                postgres. plpgsql stack depth limit exceeded
                            
                                Postgres String to Date EXAMPLE 10Apr77 to 10/04/1977
                            
                                update table with limit and offset in postgres
                            
                                Hibernate + PostgreSQL + Network Address Type (inet, cdir)
                            
                                Does PostgreSQL cache function calls?
                            
                                Error: invalid input syntax for integer: ""
                            
                                postgres: How to count distinct elements in array columns given a condition
                            
                                How to use variables in "EXECUTE format()" in plpgsql
                            
                                Update certain array elements of a json array in PostgreSQL 9.4
                            
                                How can I use server-side cursors with django and psycopg2?
                            
                                Setting up foreign key with different datatype
                            
                                Combine a PostgreSQL EXCLUDE range constraint with a UNIQUE constraint

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What to index on queries with lots of columns in the WHERE clause

Tags:

indexing

postgresql

Frexuz

People also ask

2 Answers

Laurenz Albe

bloom index

Erwin Brandstetter

Recent Activity

Donate For Us