I could not reach any conclusive answers reading some of the existing posts on this topic.
I have certain data at 100 locations the for past 10 years. The table has about 800 million rows. I need to primarily generate yearly statistics for each location. Some times I need to generate monthly variation statistics and hourly variation statistics as well. I'm wondering if I should generate two indexes - one for location and another for year or generate one index on both location and year. My primary key currently is a serial number (Probably I could use location and timestamp as the primary key).
Thanks.
A multicolumn index should be created using the columns specified as the search condition then the columns to be grouped or sorted, in this order. In this case also, a multicolumn index consisting of columns C1, C2, C3, and C4 should be created, instead of creating two single-column indexes in columns C1 and C2.
You can create an index on more than one column of a table. This index is called a multicolumn index, a composite index, a combined index, or a concatenated index. A multicolumn index can have maximum 32 columns of a table. The limit can be changed by modifying the pg_config_manual.
If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table. A multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating the values of the indexed columns.
Fortunately, PostgreSQL has the ability to combine multiple indexes (including multiple uses of the same index) to handle cases that cannot be implemented by single index scans. The system can form AND and OR conditions across several index scans.
Regardless of how many indices have you created on relation, only one of them will be used in a certain query (which one depends on query, statistics etc). So in your case you wouldn't get a cumulative advantage from creating two single column indices. To get most performance from index I would suggest to use composite index on (location, timestamp).
Note, that queries like ... WHERE timestamp BETWEEN smth AND smth
will not use the index above while queries like ... WHERE location = 'smth'
or ... WHERE location = 'smth' AND timestamp BETWEEN smth AND smth
will. It's because the first attribute in index is crucial for searching and sorting.
Don't forget to perform
ANALYZE;
after index creation in order to collect statistics.
Update: As @MondKin mentioned in comments certain queries can actually use several indexes on the same relation. For example, query with OR
clauses like a = 123 OR b = 456
(assuming that there are indexes for both columns). In this case postgres would perform bitmap index scans for both indexes, build a union of resulting bitmaps and use it for bitmap heap scan. In certain conditions the same scheme may be used for AND
queries but instead of union there would be an intersection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With