Improving query speed: simple SELECT in big postgres table

Tags:

I'm having trouble regarding speed in a SELECT query on a Postgres database.

I have a table with two integer columns as key: (int1,int2) This table has around 70 million rows.

I need to make two kind of simple SELECT queries in this environment:

SELECT * FROM table WHERE int1=X; SELECT * FROM table WHERE int2=X;

These two selects returns around 10.000 rows each out of these 70 million. For this to work as fast as possible I thought on using two HASH indexes, one for each column. Unfortunately the results are not that good:

                                                               QUERY PLAN                                                                ----------------------------------------------------------------------------------------------------------------------------------------  Bitmap Heap Scan on lec_sim  (cost=232.21..25054.38 rows=6565 width=36) (actual time=14.759..23339.545 rows=7871 loops=1)    Recheck Cond: (lec2_id = 11782)    ->  Bitmap Index Scan on lec_sim_lec2_hash_ind  (cost=0.00..230.56 rows=6565 width=0) (actual time=13.495..13.495 rows=7871 loops=1)          Index Cond: (lec2_id = 11782)  Total runtime: 23342.534 ms (5 rows)

This is an EXPLAIN ANALYZE example of one of these queries. It is taking around 23 seconds. My expectations are to get this information in less than a second.

These are some parameters of the postgres db config:

work_mem = 128MB shared_buffers = 2GB maintenance_work_mem = 512MB fsync = off synchronous_commit = off effective_cache_size = 4GB

Any help, comment or thought would be really appreciated.

Thank you in advance.

554

asked Nov 05 '12 15:11

alexdemartos

1 Answers

Extracting my comments into an answer: the index lookup here was very fast -- all the time was spent retrieving the actual rows. 23 seconds / 7871 rows = 2.9 milliseconds per row, which is reasonable for retrieving data that's scattered across the disk subsystem. Seeks are slow; you can a) fit your dataset in RAM, b) buy SSDs, or c) organize your data ahead of time to minimize seeks.

PostgreSQL 9.2 has a feature called index-only scans that allows it to (usually) answer queries without accessing the table. You can combine this with the btree index property of automatically maintaining order to make this query fast. You mention int1, int2, and two floats:

CREATE INDEX sometable_int1_floats_key ON sometable (int1, float1, float2); CREATE INDEX sometable_int2_floats_key ON sometable (int2, float1, float2);  SELECT float1,float2 FROM sometable WHERE int1=<value>; -- uses int1 index SELECT float1,float2 FROM sometable WHERE int2=<value>; -- uses int2 index

Note also that this doesn't magically erase the disk seeks, it just moves them from query time to insert time. It also costs you storage space, since you're duplicating the data. Still, this is probably the trade-off you want.

170

answered Oct 14 '22 03:10

willglynn

Related questions
                            
                                MySQL: Finding rows that don't take part in a relationship
                            
                                Can a table field contain a hyphen?
                            
                                Check if a Postgres composite field is null/empty
                            
                                Finding all related tables to a given table
                            
                                Oracle PLS-00363: expression '' cannot be used as an assignment target
                            
                                Unique (multiple columns) and null in one column
                            
                                how to convert integer minutes to interval in postgres
                            
                                Get only date without time in Oracle
                            
                                SQL UPDATE TOP with ORDER BY?
                            
                                Formatting Clear and readable SQL queries
                            
                                Join Subquery result in Linq
                            
                                Correct way to select from two tables in SQL Server with no common field to join on
                            
                                Can you GROUP BY with a CASE WHEN THEN alias name?
                            
                                How to update only one row in a table?
                            
                                How do you select all columns, plus the result of a CASE statement in oracle 11g?
                            
                                T-SQL Conditional WHERE Clause
                            
                                Connecting to SQL Server Express - What is my server name?
                            
                                MySQL multiple columns in IN clause
                            
                                Delete oldest records from database
                            
                                Oracle SQL insert into with With clause

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Improving query speed: simple SELECT in big postgres table

Tags:

performance

sql

postgresql

postgresql-performance

alexdemartos

People also ask

1 Answers

willglynn

Recent Activity

Donate For Us