As we know, Postgresql's OFFSET requires that it scan through all the rows up until the point it gets to where you requested, which makes it kind of useless for pagination through huge result sets, getting slower and slower as the OFFSET goes up. PG 8.4 now supports window functions. Instead of: <pre class="prettyprint"><code>SELECT * FROM table ORDER BY somecol LIMIT 10 OFFSET 500 </code></pre> You can say: <pre class="prettyprint"><code>SELECT * FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY somecol ASC) AS rownum FROM table) AS foo WHERE rownum > 500 AND rownum <= 510 </code></pre> Does the latter approach help us at all ? Or do we have to keep using identifying columns and temp tables for large pagination ?

Use a CURSOR for a large resultset, will be much faster. For small result sets the LIMIT OFFSET construction works fine, but it has it's limits. ROW_NUMBER is a nice thing, but not for pagination. You end up with bad performance because of sequential scan's.

OFFSET vs. ROW_NUMBER()

Tags:

postgresql

As we know, Postgresql's OFFSET requires that it scan through all the rows up until the point it gets to where you requested, which makes it kind of useless for pagination through huge result sets, getting slower and slower as the OFFSET goes up.

PG 8.4 now supports window functions. Instead of:

SELECT * FROM table ORDER BY somecol LIMIT 10 OFFSET 500

You can say:

SELECT * FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY somecol ASC) AS rownum FROM table) AS foo WHERE rownum > 500 AND rownum <= 510

Does the latter approach help us at all ? Or do we have to keep using identifying columns and temp tables for large pagination ?

373

asked Jun 26 '10 21:06

zzzeek

2 Answers

I've constructed a test which compares OFFSET, cursors, and ROW_NUMBER(). My impression of ROW_NUMBER(), that it would be consistent in speed regardless of where you are in the result set, is correct. However, that speed is dramatically slower than either OFFSET or CURSOR, which, as was also my impression, are pretty much the same in speed, both degrading in speed the further out to the end of the result you go.

Results:

offset(100,100): 0.016359 scroll(100,100): 0.018393 rownum(100,100): 15.535614  offset(100,480000): 1.761800 scroll(100,480000): 1.781913 rownum(100,480000): 15.158601  offset(100,999900): 3.670898 scroll(100,999900): 3.664517 rownum(100,999900): 14.581068

The test script uses sqlalchemy to set up tables and 1000000 rows of test data. It then uses a psycopg2 cursor to execute each SELECT statement and fetch results with the three different methods.

from sqlalchemy import *  metadata = MetaData() engine = create_engine('postgresql://scott:tiger@localhost/test', echo=True)  t1 = Table('t1', metadata,     Column('id', Integer, primary_key=True),     Column('d1', String(50)),     Column('d2', String(50)),     Column('d3', String(50)),     Column('d4', String(50)),     Column('d5', String(50)) )  if not engine.has_table('t1'):     conn = engine.connect()     t1.create(conn)      # 1000000 rows     for i in range(100):         conn.execute(t1.insert(), [             dict(                 ('d%d' % col, "data data data %d %d" % (col, (i * 10000) + j))                 for col in range(1, 6)             ) for j in xrange(1, 10001)         ])  import time  def timeit(fn, count, *args):     now = time.time()     for i in xrange(count):         fn(*args)     total = time.time() - now     print "%s(%s): %f" % (fn.__name__, ",".join(repr(x) for x in args), total)  # this is a raw psycopg2 connection. conn = engine.raw_connection()  def offset(limit, offset):     cursor = conn.cursor()     cursor.execute("select * from t1 order by id limit %d offset %d" % (limit, offset))     cursor.fetchall()     cursor.close()  def rownum(limit, offset):     cursor = conn.cursor()     cursor.execute("select * from (select *, "                     "row_number() over (order by id asc) as rownum from t1) as foo "                     "where rownum>=%d and rownum<%d" % (offset, limit + offset))     cursor.fetchall()     cursor.close()  def scroll(limit, offset):     cursor = conn.cursor('foo')     cursor.execute("select * from t1 order by id")     cursor.scroll(offset)     cursor.fetchmany(limit)     cursor.close()  print   timeit(offset, 10, 100, 100) timeit(scroll, 10, 100, 100) timeit(rownum, 10, 100, 100)  print   timeit(offset, 10, 100, 480000) timeit(scroll, 10, 100, 480000) timeit(rownum, 10, 100, 480000)  print   timeit(offset, 10, 100, 999900) timeit(scroll, 10, 100, 999900) timeit(rownum, 10, 100, 999900)

answered Sep 22 '22 05:09

zzzeek

Use a CURSOR for a large resultset, will be much faster. For small result sets the LIMIT OFFSET construction works fine, but it has it's limits.

ROW_NUMBER is a nice thing, but not for pagination. You end up with bad performance because of sequential scan's.

answered Sep 20 '22 05:09

Frank Heikens

Related questions
                            
                                PostgreSQL: how to resolve "numeric field overflow" problem
                            
                                Rails ActiveRecord: PG::Error: ERROR: column reference "created_at" is ambiguous
                            
                                Convert all records in postgres to Titlecase, first letter uppercase
                            
                                query from postgresql using python as dictionary
                            
                                Django: What are the best practices to migrate a project from sqlite to PostgreSQL
                            
                                Unable to install psycopg2 (pip install psycopg2)
                            
                                Rails Migrations: tried to change the type of column from string to integer
                            
                                postgres update after select
                            
                                Getting the last word from a Postgres string, declaratively
                            
                                Adding months to a date in PostgreSQL shows syntax error
                            
                                PG::InvalidParameterValue: ERROR: invalid value for parameter "client_min_messages": "panic"
                            
                                How to take backup of functions only in Postgres
                            
                                TypeError: rxjs_1.lastValueFrom is not a function
                            
                                How does pgBouncer help to speed up Django
                            
                                Postgresql gem install pg 0.18.4 passes, bundle install fails
                            
                                Postgres usage of btree indexes vs MySQL B+trees
                            
                                PostgreSQL error: could not connect to database template1: could not connect to server: No such file or directory
                            
                                What's the cause of "PGError: FATAL: terminating connection due to administrator command" on heroku?
                            
                                Recover postgreSQL databases from raw physical files
                            
                                Best way to learn PostgreSQL stored procedures? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

OFFSET vs. ROW_NUMBER()

Tags:

postgresql

zzzeek

People also ask

2 Answers

zzzeek

Frank Heikens

Recent Activity

Donate For Us