I have the following query running for 3hours+:
UPDATE eop_201007
set coord_x = gi.x_etrs89, coord_y = gi.x_etrs89,gr_type = 4
from eop_201007 as eop, geoindex201001 as gi
where eop.cp7=gi.cp7 AND eop.gr_type=0;
eop table has 300k+ records, and gi table 100k+.
The cp7 field is indexed in both tables, and this is taking way too much time to complete.
Am I doing it wrong? How can I improve this?
Check this topic and use EXPLAIN to see what is going on. A better configuration for WAL might help as well, just check memory usage and write speed during the update.
Edit: And make sure no other transactions lock your table, you have to wait forever...
SELECT
relname,
*
FROM
pg_locks
JOIN pg_class ON pg_locks.relation = pg_class.oid
You don't need the "eop_201007 as eop" in your FROM. The following will work:
UPDATE eop_201007
set coord_x = gi.x_etrs89, coord_y = gi.x_etrs89,gr_type = 4
from geoindex201001 as gi
where eop_201007.cp7=gi.cp7 AND eop_201007.gr_type=0;
I think the extra eop is causing a cross-join (basically the cross product of the two tables which is HUGE) because it isn't constrained against the original eop table which is "automatically" in the FROM list already
If that doesn't fix things, here's some other thoughts:
You probably want to do a vacuum analyze on it first if you haven't. Make sure you have all your memory settings tunes in postgresql.conf. Working memory, shared buffers, etc can make a huge difference.
If this is a one-time thing, and not a nightly job, you should turn fsync off. Also, make sure (if you turn fsync off) you don't have too many checkpoint segments configured (24 or so will do) otherwise you'll pollute your disk cache.
As @Frank Heikens said, you should look at explain. Also check EXPLAIN ANALYZE (if you query does ever finish).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With