Building a search engine for an apartment site and I'm not sure how to index the apartments
table.
Example of queries:
...WHERE city_id = 1 AND size > 500 AND rooms = 2
...WHERE area_id = 2 AND ad_type = 'agent' AND price BETWEEN 10000 AND 14000
...WHERE area_id = 2 OR area_id = 4 AND published_at > '2016-01-01' AND ad_type = 1
As you can see, the columns can vary a lot, and the number of columns in the WHERE clause can be up to 10, or possibly even more.
Indexes are also used to aid in filtering tables to assist in optimizing queries. The most obvious case of this is to optimize WHERE clauses. For example, the query "select * from employee where lastname = 'Jones'" results in a live cursor. The SQL engine uses Advantage Optimized Filters (AOFs) to create the filter.
Multicolumn indexes (also known as composite indexes) are similar to standard indexes. They both store a sorted “table” of pointers to the main table. Multicolumn indexes however can store additional sorted pointers to other columns.
An index can be defined on more than one column of a table. For example, if you have a table of this form: CREATE TABLE test2 ( major int, minor int, name varchar );
Columns with one or more of the following characteristics are good candidates for indexing: Values are unique in the column, or there are few duplicates. There is a wide range of values (good for regular indexes). There is a small range of values (good for bitmap indexes).
You have to figure out what WHERE
clauses you are going to use with this query, how often each will occur and and how selective each condition will be.
Don't index for queries that occur seldom unless you have to.
Use multicolumn indexes, starting with those columns that will occur in an =
comparison.
Concerning the order of columns in a multicolumn index, start with those columns that will be used in a query by themselves (an index can be used for a query with only some of its columns, provided they are at the beginning of the index).
You might omit columns with low selectivity, like gender
.
For example, with your above queries, if they are all frequent and all columns are selective, these indexes would be good:
... ON apartments (city_id, rooms, size)
... ON apartments (area_id, ad_type, price)
... ON apartments (area_id, ad_type, published_at)
These indexes could also be used for WHERE
clauses with only area_id
or city_id
in them.
It is bad to have too many indexes.
If the above method would lead to too many indexes, e.g. because the user can pick arbitrary columns for the WHERE
clause, it is better to index individual columns or occasionally pairs of columns that regularly go together.
That way PostgreSQL can pick a bitmap index scan to combine several indexes for one query. That is less efficient than a regular index scan, but usually better than a sequential scan.
Postgres 9.6 provides a new extension to address your conundrum precisely:
From the same authors who brought trigram indexes or text search to Postgres (among other things).
A single bloom index on all involved columns works well for any combination of them in the WHERE
clause - even if not as well as a separate btree indexes on each column. But a single index is much smaller and cheaper to maintain than many indexes. You'll have to weigh costs and benefits.
A bloom index excels for many index columns that can be combined in many ways.
I might combine a bloom index as "catch-all" with some tailored multicolumn btree indexes to optimize the most common combinations (along the guidelines provided by @Laurenz) and some single column indexes on the most frequently queried columns.
Some more explanation:
The feature is new and there are some important limitations. Quoting the manual:
Only operator classes for
int4
andtext
are included with the module.Only the
=
operator is supported for search. But it is possible to add support for arrays with union and intersection operations in the future.
So not for published_at
, which looks like a date
(but you could still extract an EPOCH and index that) and only for equality predicates.
After creating the extension (once per DB):
CREATE EXTENSION bloom;
Create a bloom index:
CREATE INDEX tbl_bloomidx
ON tbl USING bloom (area_id, city_id, size, rooms, ad_type); -- many more columns?
And some others:
CREATE INDEX tbl_published_at ON tbl (published_at);
CREATE INDEX tbl_published_at ON tbl (price);
-- some popular combinations...
The manual has some examples comparing bloom, multicolumn and single-column btree indexes. Very insightful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With