I have a big table with more than 500 million rows. I'm trying to find the best indexing alternative to speed up the query time a bit. I suppose sorting according to timestamp slows the query time a lot. The table has 15 columns in it.
My Table has @ManyToOne
relation to other_table. Users can define a max result as well. Code looks like this:
// I'm showing the query itself here instead of the name of @NamedQuery inside the entity class.
TypedQuery<MyTable> query = em.createNamedQuery("SELECT m FROM my_table m WHERE m.other_table.id = :id AND m.city in :cities ORDER BY m.timestamp DESC", MyTable.class);
query.setParameter("id", id);
query.setParameter("cities", cities);
query.setMaxResults(number);
return query.getResultList();
What is the best alternative for this type of query? A composite index? Which index type is most suitable in this case?
We have an index like this but as I said, it takes a long time with this.
CREATE INDEX my_table_idx ON my_schema.my_table USING btree (other_table_id, timestamp DESC NULLS LAST, city)
EDIT 1:
This is the execution plan:
Limit (cost=2876886.98..2876887.03 rows=20 width=162) (actual time=101820.279..101820.284 rows=20 loops=1)
Buffers: shared hit=8063 read=635649 written=12198
-> Sort (cost=2876886.98..2879114.34 rows=890941 width=162) (actual time=101820.277..101820.278 rows=20 loops=1)
Sort Key: timestamp DESC
Sort Method: top-N heapsort Memory: 35kB
Buffers: shared hit=8063 read=635649 written=12198
-> Bitmap Heap Scan on my_table (cost=31640.64..2853179.36 rows=890941 width=162) (actual time=199.824..101221.260 rows=711774 loops=1)
Recheck Cond: ((m_other_table_id = '14b713d5-fb1a-4dbd-c013-fat4a7f6c8e3'::uuid) AND (m_city_id = 3))
Rows Removed by Index Recheck: 28920837
Heap Blocks: exact=23535 lossy=615808
Buffers: shared hit=8060 read=635649 written=12198
-> Bitmap Index Scan on my_table_idx (cost=0.00..31417.90 rows=890941 width=0) (actual time=189.011..189.012 rows=711777 loops=1)
Index Cond: ((m_other_table_id = '14b713d5-fb1a-4dbd-c013-fat4a7f6c8e3'::uuid) AND (m_city_id = 3))
Buffers: shared hit=90 read=4276
Planning time: 0.198 ms
Execution time: 101821.109 ms
and these are the indexes we have:
CREATE INDEX my_table_idx ON my_schema.my_table USING btree (other_table_id, timestamp DESC NULLS LAST, city)
CREATE UNIQUE INDEX my_table_prev_id_idx ON my_schema.my_table USING btree (m_prev_id)
CREATE INDEX my_table_other_table_fk_idx ON my_schema.my_table USING btree (m_other_table_id)
CREATE UNIQUE INDEX my_table_pkey ON my_schema.my_table USING btree (m_id)
CREATE INDEX my_table_track_fk_idx ON my_schema.my_table USING btree (m_track_id)
EDIT 2:
I wonder why parallel workers are not showing on my execution plan. I have these settings configured:
max_worker_processes = 6;
max_parallel_workers = 6;
max_parallel_workers_per_gather = 3;
So How Do You Fix Slow Queries in PostgreSQL? To speed up this particular PostgreSQL slow query, we need to know whether we really need all rows. If not, we should only get N of them by adding a LIMIT clause. If they are, we should use a cursor.
Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus.
Disk Access. PostgreSQL attempts to do a lot of its work in memory, and spread out writing to disk to minimize bottlenecks, but on an overloaded system with heavy writing, it's easily possible to see heavy reads and writes cause the whole system to slow as it catches up on the demands.
So a query is limited to 1 gigabyte (2^30) in size, minus 1 byte for a terminating null byte.
Based on this line from the plan:
Cond: ((m_other_table_id = '14b713d5-fb1a-4dbd-c013-fat4a7f6c8e3'::uuid) AND (m_city_id = 3))
your ideal index should be on (m_other_table_id, m_city_id)
. Not other_table_id
, not city
. The plan you showed doesn't really match the query, so it is hard to tell where the typos are - in the shown plan or in the shown query.
Also, since you order by timestamp
, we can add it to the index.
So, I would try the following index:
CREATE INDEX idx ON my_schema.my_table USING btree
(m_other_table_id, m_city_id, timestamp DESC)
The order of columns here is important.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With