Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple indexes vs single index on multiple columns in postgresql

I could not reach any conclusive answers reading some of the existing posts on this topic.

I have certain data at 100 locations the for past 10 years. The table has about 800 million rows. I need to primarily generate yearly statistics for each location. Some times I need to generate monthly variation statistics and hourly variation statistics as well. I'm wondering if I should generate two indexes - one for location and another for year or generate one index on both location and year. My primary key currently is a serial number (Probably I could use location and timestamp as the primary key).

Thanks.

like image 504
let_there_be_light Avatar asked Sep 02 '16 16:09

let_there_be_light


People also ask

Which one is good indexing in multiple-column or single column?

A multicolumn index should be created using the columns specified as the search condition then the columns to be grouped or sorted, in this order. In this case also, a multicolumn index consisting of columns C1, C2, C3, and C4 should be created, instead of creating two single-column indexes in columns C1 and C2.

Can we create index on multiple columns in PostgreSQL?

You can create an index on more than one column of a table. This index is called a multicolumn index, a composite index, a combined index, or a concatenated index. A multicolumn index can have maximum 32 columns of a table. The limit can be changed by modifying the pg_config_manual.

Is indexes allowed in multiple columns?

If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table. A multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating the values of the indexed columns.

Can PostgreSQL use multiple indexes?

Fortunately, PostgreSQL has the ability to combine multiple indexes (including multiple uses of the same index) to handle cases that cannot be implemented by single index scans. The system can form AND and OR conditions across several index scans.


1 Answers

Regardless of how many indices have you created on relation, only one of them will be used in a certain query (which one depends on query, statistics etc). So in your case you wouldn't get a cumulative advantage from creating two single column indices. To get most performance from index I would suggest to use composite index on (location, timestamp).

Note, that queries like ... WHERE timestamp BETWEEN smth AND smth will not use the index above while queries like ... WHERE location = 'smth' or ... WHERE location = 'smth' AND timestamp BETWEEN smth AND smth will. It's because the first attribute in index is crucial for searching and sorting.

Don't forget to perform

ANALYZE; 

after index creation in order to collect statistics.

Update: As @MondKin mentioned in comments certain queries can actually use several indexes on the same relation. For example, query with OR clauses like a = 123 OR b = 456 (assuming that there are indexes for both columns). In this case postgres would perform bitmap index scans for both indexes, build a union of resulting bitmaps and use it for bitmap heap scan. In certain conditions the same scheme may be used for AND queries but instead of union there would be an intersection.

like image 147
Ildar Musin Avatar answered Oct 04 '22 04:10

Ildar Musin