Postgresql: multicolumn indexes vs single column index

Tags:

I have a table that will grow by 10M of rows every year.

The table have 10 columns, call them c1, c2, c3, ..., c10.

I will use the WHERE clause, potentially on 8 of them.

To be more specific: every time I will query the table, there will ALWAYS be a WHERE clause on column c10 (it's a date, I could search for equality or range).

The other 7 possible searchable columns, will not follow any schema. I could search for:

c10, c1, c2, c5
c10, c5
c10, c3
c10, c2, c6
c10, c2, c3, c5, c6

... and all other possible combinations.

So, in a WHERE clause, c10 will always be present and the others could be present in any combination (even not present at all).

What indexing strategy could improve performance in this scenario? I think that the right thing to do, is to create a single index for every column. There could be any performance improvements using a multicolumn index?

From what I know, you will gain performance with a multicolumn index on (c1, c2, c3), only for queries that use c1, c2, c3 or c1, c2, or c1, in that order. But like I said, the only thing I can assume in my scenario, is that c10 will always be present in a WHERE clause (it can also be the first clause, if that helps)

396

asked Apr 11 '18 13:04

Radar155

1 Answers

To answer the question of what kind of index should we use, we can create a simple test. first, we create a database, table, and index.

CREATE DATABASE index_test;

CREATE TABLE single_column(a int, b int, c int);
CREATE TABLE multi_column(a int, b int, c int);

CREATE INDEX single_column_a_idx ON single_column (a);
CREATE INDEX single_column_b_idx ON single_column (b);
CREATE INDEX single_column_c_idx ON single_column (c);

CREATE INDEX multi_column_idx ON multi_column (a, b, c);

populate the table with random data.

-- this function will be used for random number generation
CREATE OR REPLACE FUNCTION random_in_range(INTEGER, INTEGER) RETURNS INTEGER AS $$
SELECT floor(($1 + ($2 - $1 + 1) * random()))::INTEGER;
$$ LANGUAGE SQL;

INSERT INTO single_column(a, b, c)
SELECT random_in_range(1, 100),
    random_in_range(1, 100),
    random_in_range(1, 100)
FROM generate_series(1, 1000000);

INSERT INTO multi_column(a, b, c)
SELECT random_in_range(1, 100),
    random_in_range(1, 100),
    random_in_range(1, 100)
FROM generate_series(1, 1000000);

run the test.

EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3;
EXPLAIN ANALYZE SELECT * FROM single_column WHERE b < 3;
EXPLAIN ANALYZE SELECT * FROM single_column WHERE c < 3;

EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE b < 3;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE c < 3;

EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3 AND b > 10 AND c <= 11;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3 AND b > 10 AND c <= 11;

result

index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3;
                                                               QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=3925.39..13926.49 rows=367608 width=12) (actual time=5.802..44.904 rows=20070 loops=1)
   Recheck Cond: (a < 3)
   Heap Blocks: exact=5269
   ->  Bitmap Index Scan on single_column_a_idx  (cost=0.00..3833.49 rows=367608 width=0) (actual time=4.018..4.019 rows=20070 loops=1)
         Index Cond: (a < 3)
 Planning Time: 0.325 ms
 Execution Time: 46.589 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE b < 3;
                                                               QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=3925.39..13926.49 rows=367608 width=12) (actual time=6.630..26.814 rows=19902 loops=1)
   Recheck Cond: (b < 3)
   Heap Blocks: exact=5296
   ->  Bitmap Index Scan on single_column_b_idx  (cost=0.00..3833.49 rows=367608 width=0) (actual time=4.852..4.853 rows=19902 loops=1)
         Index Cond: (b < 3)
 Planning Time: 0.270 ms
 Execution Time: 28.762 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE c < 3;
                                                               QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=3925.39..13926.49 rows=367608 width=12) (actual time=5.896..25.304 rows=19946 loops=1)
   Recheck Cond: (c < 3)
   Heap Blocks: exact=5274
   ->  Bitmap Index Scan on single_column_c_idx  (cost=0.00..3833.49 rows=367608 width=0) (actual time=4.125..4.126 rows=19946 loops=1)
         Index Cond: (c < 3)
 Planning Time: 0.270 ms
 Execution Time: 27.136 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3;
                                                             QUERY PLAN                                                 
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on multi_column  (cost=8569.39..18570.49 rows=367608 width=12) (actual time=7.760..67.173 rows=19938 loops=1)
   Recheck Cond: (a < 3)
   Heap Blocks: exact=5267
   ->  Bitmap Index Scan on multi_column_idx  (cost=0.00..8477.49 rows=367608 width=0) (actual time=6.008..6.008 rows=19938 loops=1)
         Index Cond: (a < 3)
 Planning Time: 0.564 ms
 Execution Time: 68.630 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE b < 3;
                                                           QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13481.03 rows=18667 width=12) (actual time=1.451..135.028 rows=19897 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on multi_column  (cost=0.00..10614.33 rows=7778 width=12) (actual time=0.038..61.993 rows=6632 loops=3)
         Filter: (b < 3)
         Rows Removed by Filter: 326701
 Planning Time: 1.123 ms
 Execution Time: 136.128 ms
(8 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE c < 3;
                                                           QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13627.63 rows=20133 width=12) (actual time=0.957..135.119 rows=19860 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on multi_column  (cost=0.00..10614.33 rows=8389 width=12) (actual time=0.035..66.760 rows=6620 loops=3)
         Filter: (c < 3)
         Rows Removed by Filter: 326713
 Planning Time: 0.225 ms
 Execution Time: 136.239 ms
(8 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3 AND b > 10 AND c <= 11;
                                                                   QUERY PLAN                                           
-------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=1424.66..5716.83 rows=2110 width=12) (actual time=21.694..26.123 rows=2000 loops=1)
   Recheck Cond: ((a < 3) AND (c <= 11))
   Filter: (b > 10)
   Rows Removed by Filter: 230
   Heap Blocks: exact=1833
   ->  BitmapAnd  (cost=1424.66..1424.66 rows=2338 width=0) (actual time=20.981..20.983 rows=0 loops=1)
         ->  Bitmap Index Scan on single_column_a_idx  (cost=0.00..230.43 rows=21067 width=0) (actual time=3.932..3.932 rows=20070 loops=1)
               Index Cond: (a < 3)
         ->  Bitmap Index Scan on single_column_c_idx  (cost=0.00..1192.92 rows=111000 width=0) (actual time=16.080..16.080 rows=110276 loops=1)
               Index Cond: (c <= 11)
 Planning Time: 1.812 ms
 Execution Time: 26.742 ms
(12 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3 AND b > 10 AND c <= 11;
                                                                 QUERY PLAN                                             
---------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using multi_column_idx on multi_column  (cost=0.42..642.38 rows=2071 width=12) (actual time=0.329..2.086 rows=1953 loops=1)
   Index Cond: ((a < 3) AND (b > 10) AND (c <= 11))
   Heap Fetches: 0
 Planning Time: 0.176 ms
 Execution Time: 2.165 ms
(5 rows)

Conclusion

single_column table will always use the index in any condition.

EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3; -- index used
EXPLAIN ANALYZE SELECT * FROM single_column WHERE b < 3; -- index used
EXPLAIN ANALYZE SELECT * FROM single_column WHERE c < 3; -- index used

EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3 AND b > 10 AND c <= 11; -- index used

Single column WHERE performed on multi_column table, the index will only be used if the column in the query is the same as the first column in the index definition.

EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3; -- index used
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE b < 3; -- index not used
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE c < 3; -- index not used

Although single_column table can use indexes on multi column WHERE, multi_column table is faster.
Although multi_column table can use indexes on single column WHERE, single_column table is faster.

190

answered Nov 15 '22 05:11

Edwin Sulaiman

Related questions
                            
                                Cumulative distinct count with Spark SQL
                            
                                Long running Stored procedure without keeping connection opened to Azure database
                            
                                Does order of columns of Multi-Column Indexes in where clause in MySQL matter?
                            
                                Parsing SQL query in Java
                            
                                Python - Sqlite - OperationalError: near "s": syntax error [duplicate]
                            
                                Convert Decimal to Time SSRS/SQL
                            
                                Update each row in table
                            
                                Difference between INNER JOIN and , COMMA [duplicate]
                            
                                How to insert and retrieve image from PostgreSql using C#
                            
                                How to avoid deadlock while mutiple process working on same table in sql
                            
                                How to insert JSON object to SQL Server 2016 as nvarchar
                            
                                Are conditions without ISNULL stinky?
                            
                                MySQL - How to find word with the most similar beginning
                            
                                ORACLE update with returning OLD and NEW values
                            
                                SQL Server Data Masking Minimum permissions needed to mask
                            
                                database in restore state
                            
                                java.sql.Timestamp wrong time parsing
                            
                                Transaction Isolations level in SQL Server
                            
                                Parent child hierarchy with order by on name
                            
                                How can I get ids of grouped rows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Postgresql: multicolumn indexes vs single column index

Tags:

sql

indexing

postgresql

Radar155

People also ask

1 Answers

Conclusion

Edwin Sulaiman

Recent Activity

Donate For Us