Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql Offset Behavior with json column

Tags:

postgresql

Using postgresql 9.4 we have a simple contacts table with (id text not null (as pk), blob json) to experiment with porting a couchdb crm database. We will eventually split out to more columns etc, and handle the data more idomatically for a rdbms, but that's besides the point for the time being.

There are approximately 100k rows.

I am aware that hardcore postgresql performance experts advise against using offset however I can accept a small performance penalty (happy with anything under 100msec)

SELECT id FROM couchcontacts OFFSET 10000 LIMIT 10 

As expected takes <10ms

SELECT blob->>'firstName' FROM couchcontacts LIMIT 10 

Also takes < 10ms (presume 10 json decode ops on blob column here)

SELECT blob->>'firstName' FROM couchcontacts OFFSET 10000 LIMIT 10 

Takes upwards of 10 seconds!! Noted inefficiencies of offset aside why is this presumably causing 10,010 json decode ops? As the projection has no side-effects I don't understand the reason this can't be fast?

Is this a limitation of json functionality being relatively new to postgres? and thus unable to determine ->> opereator isnt yielding side-effects?

Interesting rewriting the query to this bring it back under 10milliseconds

SELECT jsonblob->>'firstName' FROM couchdbcontacts WHERE id IN (SELECT id FROM couchcontacts OFFSET 10000 LIMIT 10)

Is there a way to ensure offset doesnt json decode the offsetted records? (i.e. don't execute the select projection)

"Limit  (cost=1680.31..1681.99 rows=10 width=32) (actual time=12634.674..12634.842 rows=10 loops=1)"
"  ->  Seq Scan on couchcontacts  (cost=0.00..17186.53 rows=102282 width=32) (actual time=0.088..12629.401 rows=10010 loops=1)"
"Planning time: 0.194 ms"
"Execution time: 12634.895 ms"
like image 964
Mâtt Frëëman Avatar asked Dec 11 '14 04:12

Mâtt Frëëman


1 Answers

I ran a few tests, and I'm seeing similar behaviors. Each of these have immaterial differences in performance:

  • select id ...
  • select indexed_field ...
  • select unindexed_field ...
  • select json_field ...
  • select * ...

This one, however, does show a difference in performance:

  • select json_field->>'key' ...

When the json_field is null, the performance impact is negligible. When it's empty, it degrades things very slightly. When it's filled in, it degrades noticeably. And when the field is loaded with larger data, it degrades materially.

In other words, Postgres seems to want to unserialize the json data for every row it's visiting. (Which is probably a bug, and one that's massively affecting RoR developers seeing how they use json.)

Fwiw, I noted that re-arranging the query so it uses a CTE will work around the problem:

with data as (
  select * from table offset 10000 limit 10
)
select json_field->>'key' from data;

(It might get an only-very-slightly better plan than the id IN (...) query that you highlighted.)

like image 190
Denis de Bernardy Avatar answered Oct 20 '22 07:10

Denis de Bernardy