I have several large tables in Postgres 9.2 (millions of rows) where I need to generate a unique code based on the combination of two fields, 'source' (varchar) and 'id' (int). I can do this by generating row_numbers over the result of: <pre class="prettyprint"><code>SELECT source,id FROM tablename GROUP BY source,id </code></pre> but the results can take a while to process. It has been recommended that if the fields are indexed, and there are a proportionally small number of index values (which is my case), that a loose index scan may be a better option: http://wiki.postgresql.org/wiki/Loose_indexscan <pre class="prettyprint"><code>WITH RECURSIVE t AS (SELECT min(col) AS col FROM tablename UNION ALL SELECT (SELECT min(col) FROM tablename WHERE col > t.col) FROM t WHERE t.col IS NOT NULL) SELECT col FROM t WHERE col IS NOT NULL UNION ALL SELECT NULL WHERE EXISTS(SELECT * FROM tablename WHERE col IS NULL); </code></pre> The example operates on a single field though. Trying to return more than one field generates an error: subquery must return only one column. One possibility might be to try retrieving an entire ROW - e.g. <code>SELECT ROW(min(source),min(id)...</code>, but then I'm not sure what the syntax of the WHERE statement would need to look like to work with individual row elements. The question is: can the recursion-based code be modified to work with more than one column, and if so, how? I'm committed to using Postgres, but it looks like MySQL has implemented loose index scans for more than one column: http://dev.mysql.com/doc/refman/5.1/en/group-by-optimization.html <hr> As recommended, I'm attaching my EXPLAIN ANALYZE results. For my situation - where I'm selecting distinct values for 2 columns using GROUP BY, it's the following: <pre class="prettyprint"><code> HashAggregate (cost=1645408.44..1654099.65 rows=869121 width=34) (actual time=35411.889..36008.475 rows=1233080 loops=1) -> Seq Scan on tablename (cost=0.00..1535284.96 rows=22024696 width=34) (actual time=4413.311..25450.840 rows=22025768 loops=1) Total runtime: 36127.789 ms (3 rows) </code></pre> I don't know how to do a 2-column index scan (that's the question), but for purposes of comparison, using a GROUP BY on one column, I get: <pre class="prettyprint"><code> HashAggregate (cost=1590346.70..1590347.69 rows=99 width=8) (actual time=32310.706..32310.722 rows=100 loops=1) -> Seq Scan on tablename (cost=0.00..1535284.96 rows=22024696 width=8) (actual time=4764.609..26941.832 rows=22025768 loops=1) Total runtime: 32350.899 ms (3 rows) </code></pre> But for a loose index scan on one column, I get: <pre class="prettyprint"><code> Result (cost=181.28..198.07 rows=101 width=8) (actual time=0.069..1.935 rows=100 loops=1) CTE t -> Recursive Union (cost=1.74..181.28 rows=101 width=8) (actual time=0.062..1.855 rows=101 loops=1) -> Result (cost=1.74..1.75 rows=1 width=0) (actual time=0.061..0.061 rows=1 loops=1) InitPlan 1 (returns $1) -> Limit (cost=0.00..1.74 rows=1 width=8) (actual time=0.057..0.057 rows=1 loops=1) -> Index Only Scan using tablename_id on tablename (cost=0.00..38379014.12 rows=22024696 width=8) (actual time=0.055..0.055 rows=1 loops=1) Index Cond: (id IS NOT NULL) Heap Fetches: 0 -> WorkTable Scan on t (cost=0.00..17.75 rows=10 width=8) (actual time=0.017..0.017 rows=1 loops=101) Filter: (id IS NOT NULL) Rows Removed by Filter: 0 SubPlan 3 -> Result (cost=1.75..1.76 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=100) InitPlan 2 (returns $3) -> Limit (cost=0.00..1.75 rows=1 width=8) (actual time=0.016..0.016 rows=1 loops=100) -> Index Only Scan using tablename_id on tablename (cost=0.00..12811462.41 rows=7341565 width=8) (actual time=0.015..0.015 rows=1 loops=100) Index Cond: ((id IS NOT NULL) AND (id > t.id)) Heap Fetches: 0 -> Append (cost=0.00..16.79 rows=101 width=8) (actual time=0.067..1.918 rows=100 loops=1) -> CTE Scan on t (cost=0.00..2.02 rows=100 width=8) (actual time=0.067..1.899 rows=100 loops=1) Filter: (id IS NOT NULL) Rows Removed by Filter: 1 -> Result (cost=13.75..13.76 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) One-Time Filter: $5 InitPlan 5 (returns $5) -> Index Only Scan using tablename_id on tablename (cost=0.00..13.75 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (id IS NULL) Heap Fetches: 0 Total runtime: 2.040 ms </code></pre> The full table definition looks like this: <pre class="prettyprint"><code>CREATE TABLE tablename ( source character(25), id bigint NOT NULL, time_ timestamp without time zone, height numeric, lon numeric, lat numeric, distance numeric, status character(3), geom geometry(PointZ,4326), relid bigint ) WITH ( OIDS=FALSE ); CREATE INDEX tablename_height ON public.tablename USING btree (height); CREATE INDEX tablename_geom ON public.tablename USING gist (geom); CREATE INDEX tablename_id ON public.tablename USING btree (id); CREATE INDEX tablename_lat ON public.tablename USING btree (lat); CREATE INDEX tablename_lon ON public.tablename USING btree (lon); CREATE INDEX tablename_relid ON public.tablename USING btree (relid); CREATE INDEX tablename_sid ON public.tablename USING btree (source COLLATE pg_catalog."default", id); CREATE INDEX tablename_source ON public.tablename USING btree (source COLLATE pg_catalog."default"); CREATE INDEX tablename_time ON public.tablename USING btree (time_); </code></pre> <hr> Answer selection: I took some time in comparing the approaches that were provided. It's at times like this that I wish that more than one answer could be accepted, but in this case, I'm giving the tick to @jjanes. The reason for this is that his solution matches the question as originally posed more closely, and I was able to get some insights as to the form of the required WHERE statement. In the end, the HashAggregate is actually the fastest approach (for me), but that's due to the nature of my data, not any problems with the algorithms. I've attached the EXPLAIN ANALYZE for the different approaches below, and will be giving +1 to both jjanes and joop. HashAggregate: <pre class="prettyprint"><code> HashAggregate (cost=1018669.72..1029722.08 rows=1105236 width=34) (actual time=24164.735..24686.394 rows=1233080 loops=1) -> Seq Scan on tablename (cost=0.00..908548.48 rows=22024248 width=34) (actual time=0.054..14639.931 rows=22024982 loops=1) Total runtime: 24787.292 ms </code></pre> Loose Index Scan modification <pre class="prettyprint"><code>CTE Scan on t (cost=13.84..15.86 rows=100 width=112) (actual time=0.916..250311.164 rows=1233080 loops=1) Filter: (source IS NOT NULL) Rows Removed by Filter: 1 CTE t -> Recursive Union (cost=0.00..13.84 rows=101 width=112) (actual time=0.911..249295.872 rows=1233081 loops=1) -> Limit (cost=0.00..0.04 rows=1 width=34) (actual time=0.910..0.911 rows=1 loops=1) -> Index Only Scan using tablename_sid on tablename (cost=0.00..965442.32 rows=22024248 width=34) (actual time=0.908..0.908 rows=1 loops=1) Heap Fetches: 0 -> WorkTable Scan on t (cost=0.00..1.18 rows=10 width=112) (actual time=0.201..0.201 rows=1 loops=1233081) Filter: (source IS NOT NULL) Rows Removed by Filter: 0 SubPlan 1 -> Limit (cost=0.00..0.05 rows=1 width=34) (actual time=0.100..0.100 rows=1 loops=1233080) -> Index Only Scan using tablename_sid on tablename (cost=0.00..340173.38 rows=7341416 width=34) (actual time=0.100..0.100 rows=1 loops=1233080) Index Cond: (ROW(source, id) > ROW(t.source, t.id)) Heap Fetches: 0 SubPlan 2 -> Limit (cost=0.00..0.05 rows=1 width=34) (actual time=0.099..0.099 rows=1 loops=1233080) -> Index Only Scan using tablename_sid on tablename (cost=0.00..340173.38 rows=7341416 width=34) (actual time=0.098..0.098 rows=1 loops=1233080) Index Cond: (ROW(source, id) > ROW(t.source, t.id)) Heap Fetches: 0 Total runtime: 250491.559 ms </code></pre> Merge Anti Join <pre class="prettyprint"><code>Merge Anti Join (cost=0.00..12099015.26 rows=14682832 width=42) (actual time=48.710..541624.677 rows=1233080 loops=1) Merge Cond: ((src.source = nx.source) AND (src.id = nx.id)) Join Filter: (nx.time_ > src.time_) Rows Removed by Join Filter: 363464177 -> Index Only Scan using tablename_pkey on tablename src (cost=0.00..1060195.27 rows=22024248 width=42) (actual time=48.566..5064.551 rows=22024982 loops=1) Heap Fetches: 0 -> Materialize (cost=0.00..1115255.89 rows=22024248 width=42) (actual time=0.011..40551.997 rows=363464177 loops=1) -> Index Only Scan using tablename_pkey on tablename nx (cost=0.00..1060195.27 rows=22024248 width=42) (actual time=0.008..8258.890 rows=22024982 loops=1) Heap Fetches: 0 Total runtime: 541750.026 ms </code></pre>

Rather hideous, but this seems to work: <pre class="prettyprint"><code>WITH RECURSIVE t AS ( select a,b from (select a,b from foo order by a,b limit 1) asdf union all select (select a from foo where (a,b) > (t.a,t.b) order by a,b limit 1), (select b from foo where (a,b) > (t.a,t.b) order by a,b limit 1) from t where t.a is not null) select * from t where t.a is not null; </code></pre> I don't really understand why the "is not nulls" are needed, as where do the nulls come from in the first place?

<pre class="prettyprint"><code>DROP SCHEMA zooi CASCADE; CREATE SCHEMA zooi ; SET search_path=zooi,public,pg_catalog; CREATE TABLE tablename ( source character(25) NOT NULL , id bigint NOT NULL , time_ timestamp without time zone NOT NULL , height numeric , lon numeric , lat numeric , distance numeric , status character(3) , geom geometry(PointZ,4326) , relid bigint , PRIMARY KEY (source,id,time_) -- <<-- Primary key here ) WITH ( OIDS=FALSE); -- invent some bogus data INSERT INTO tablename(source,id,time_) SELECT 'SRC_'|| (gs%10)::text ,gs/10 ,gt FROM generate_series(1,1000) gs , generate_series('2013-12-01', '2013-12-07', '1hour'::interval) gt ; </code></pre> Select unique values for two key fields: <pre class="prettyprint"><code>VACUUM ANALYZE tablename; EXPLAIN ANALYZE SELECT source,id,time_ FROM tablename src WHERE NOT EXISTS ( SELECT * FROM tablename nx WHERE nx.source =src.source AND nx.id = src.id AND time_ > src.time_ ) ; </code></pre> Generates this plan here (Pg-9.3): <pre class="prettyprint"><code> QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Hash Anti Join (cost=4981.00..12837.82 rows=96667 width=42) (actual time=547.218..1194.335 rows=1000 loops=1) Hash Cond: ((src.source = nx.source) AND (src.id = nx.id)) Join Filter: (nx.time_ > src.time_) Rows Removed by Join Filter: 145000 -> Seq Scan on tablename src (cost=0.00..2806.00 rows=145000 width=42) (actual time=0.010..210.810 rows=145000 loops=1) -> Hash (cost=2806.00..2806.00 rows=145000 width=42) (actual time=546.497..546.497 rows=145000 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 9063kB -> Seq Scan on tablename nx (cost=0.00..2806.00 rows=145000 width=42) (actual time=0.006..259.864 rows=145000 loops=1) Total runtime: 1197.374 ms (9 rows) </code></pre> The hash-joins will probably disappear once the data outgrows the work_mem: <pre class="prettyprint"><code> Merge Anti Join (cost=0.83..8779.56 rows=29832 width=120) (actual time=0.981..2508.912 rows=1000 loops=1) Merge Cond: ((src.source = nx.source) AND (src.id = nx.id)) Join Filter: (nx.time_ > src.time_) Rows Removed by Join Filter: 184051 -> Index Scan using tablename_sid on tablename src (cost=0.41..4061.57 rows=32544 width=120) (actual time=0.055..250.621 rows=145000 loops=1) -> Index Scan using tablename_sid on tablename nx (cost=0.41..4061.57 rows=32544 width=120) (actual time=0.008..603.403 rows=328906 loops=1) Total runtime: 2510.505 ms </code></pre>

Loose index scan in Postgres on more than one field?

Tags:

indexing

postgresql

group-by

distinct

postgresql-9.2

I have several large tables in Postgres 9.2 (millions of rows) where I need to generate a unique code based on the combination of two fields, 'source' (varchar) and 'id' (int). I can do this by generating row_numbers over the result of:

SELECT source,id FROM tablename GROUP BY source,id

but the results can take a while to process. It has been recommended that if the fields are indexed, and there are a proportionally small number of index values (which is my case), that a loose index scan may be a better option: http://wiki.postgresql.org/wiki/Loose_indexscan

WITH RECURSIVE
     t AS (SELECT min(col) AS col FROM tablename
           UNION ALL
           SELECT (SELECT min(col) FROM tablename WHERE col > t.col) FROM t WHERE t.col IS NOT NULL)
SELECT col FROM t WHERE col IS NOT NULL
UNION ALL
SELECT NULL WHERE EXISTS(SELECT * FROM tablename WHERE col IS NULL);

The example operates on a single field though. Trying to return more than one field generates an error: subquery must return only one column. One possibility might be to try retrieving an entire ROW - e.g. SELECT ROW(min(source),min(id)..., but then I'm not sure what the syntax of the WHERE statement would need to look like to work with individual row elements.

The question is: can the recursion-based code be modified to work with more than one column, and if so, how? I'm committed to using Postgres, but it looks like MySQL has implemented loose index scans for more than one column: http://dev.mysql.com/doc/refman/5.1/en/group-by-optimization.html

As recommended, I'm attaching my EXPLAIN ANALYZE results.

For my situation - where I'm selecting distinct values for 2 columns using GROUP BY, it's the following:

 HashAggregate  (cost=1645408.44..1654099.65 rows=869121 width=34) (actual time=35411.889..36008.475 rows=1233080 loops=1)
   ->  Seq Scan on tablename  (cost=0.00..1535284.96 rows=22024696 width=34) (actual time=4413.311..25450.840 rows=22025768 loops=1)
 Total runtime: 36127.789 ms
(3 rows)

I don't know how to do a 2-column index scan (that's the question), but for purposes of comparison, using a GROUP BY on one column, I get:

 HashAggregate  (cost=1590346.70..1590347.69 rows=99 width=8) (actual time=32310.706..32310.722 rows=100 loops=1)
   ->  Seq Scan on tablename  (cost=0.00..1535284.96 rows=22024696 width=8) (actual time=4764.609..26941.832 rows=22025768 loops=1)
 Total runtime: 32350.899 ms
(3 rows)

But for a loose index scan on one column, I get:

 Result  (cost=181.28..198.07 rows=101 width=8) (actual time=0.069..1.935 rows=100 loops=1)
   CTE t
     ->  Recursive Union  (cost=1.74..181.28 rows=101 width=8) (actual time=0.062..1.855 rows=101 loops=1)
           ->  Result  (cost=1.74..1.75 rows=1 width=0) (actual time=0.061..0.061 rows=1 loops=1)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.00..1.74 rows=1 width=8) (actual time=0.057..0.057 rows=1 loops=1)
                         ->  Index Only Scan using tablename_id on tablename  (cost=0.00..38379014.12 rows=22024696 width=8) (actual time=0.055..0.055 rows=1 loops=1)
                               Index Cond: (id IS NOT NULL)
                               Heap Fetches: 0
           ->  WorkTable Scan on t  (cost=0.00..17.75 rows=10 width=8) (actual time=0.017..0.017 rows=1 loops=101)
                 Filter: (id IS NOT NULL)
                 Rows Removed by Filter: 0
                 SubPlan 3
                   ->  Result  (cost=1.75..1.76 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=100)
                         InitPlan 2 (returns $3)
                           ->  Limit  (cost=0.00..1.75 rows=1 width=8) (actual time=0.016..0.016 rows=1 loops=100)
                                 ->  Index Only Scan using tablename_id on tablename  (cost=0.00..12811462.41 rows=7341565 width=8) (actual time=0.015..0.015 rows=1 loops=100)
                                       Index Cond: ((id IS NOT NULL) AND (id > t.id))
                                       Heap Fetches: 0
   ->  Append  (cost=0.00..16.79 rows=101 width=8) (actual time=0.067..1.918 rows=100 loops=1)
         ->  CTE Scan on t  (cost=0.00..2.02 rows=100 width=8) (actual time=0.067..1.899 rows=100 loops=1)
               Filter: (id IS NOT NULL)
               Rows Removed by Filter: 1
         ->  Result  (cost=13.75..13.76 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
               One-Time Filter: $5
               InitPlan 5 (returns $5)
                 ->  Index Only Scan using tablename_id on tablename  (cost=0.00..13.75 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
                       Index Cond: (id IS NULL)
                       Heap Fetches: 0
 Total runtime: 2.040 ms

The full table definition looks like this:

CREATE TABLE tablename
(
  source character(25),
  id bigint NOT NULL,
  time_ timestamp without time zone,
  height numeric,
  lon numeric,
  lat numeric,
  distance numeric,
  status character(3),
  geom geometry(PointZ,4326),
  relid bigint
)
WITH (
  OIDS=FALSE
);

CREATE INDEX tablename_height
  ON public.tablename
  USING btree
  (height);

CREATE INDEX tablename_geom
  ON public.tablename
  USING gist
  (geom);


CREATE INDEX tablename_id
  ON public.tablename
  USING btree
  (id);

CREATE INDEX tablename_lat
  ON public.tablename
  USING btree
  (lat);

CREATE INDEX tablename_lon
  ON public.tablename
  USING btree
  (lon);

CREATE INDEX tablename_relid
  ON public.tablename
  USING btree
  (relid);

CREATE INDEX tablename_sid
  ON public.tablename
  USING btree
  (source COLLATE pg_catalog."default", id);

CREATE INDEX tablename_source
  ON public.tablename
  USING btree
  (source COLLATE pg_catalog."default");

CREATE INDEX tablename_time
  ON public.tablename
  USING btree
  (time_);

Answer selection:

I took some time in comparing the approaches that were provided. It's at times like this that I wish that more than one answer could be accepted, but in this case, I'm giving the tick to @jjanes. The reason for this is that his solution matches the question as originally posed more closely, and I was able to get some insights as to the form of the required WHERE statement. In the end, the HashAggregate is actually the fastest approach (for me), but that's due to the nature of my data, not any problems with the algorithms. I've attached the EXPLAIN ANALYZE for the different approaches below, and will be giving +1 to both jjanes and joop.

HashAggregate:

 HashAggregate  (cost=1018669.72..1029722.08 rows=1105236 width=34) (actual time=24164.735..24686.394 rows=1233080 loops=1)
   ->  Seq Scan on tablename  (cost=0.00..908548.48 rows=22024248 width=34) (actual time=0.054..14639.931 rows=22024982 loops=1)
 Total runtime: 24787.292 ms

Loose Index Scan modification

CTE Scan on t  (cost=13.84..15.86 rows=100 width=112) (actual time=0.916..250311.164 rows=1233080 loops=1)
   Filter: (source IS NOT NULL)
   Rows Removed by Filter: 1
   CTE t
     ->  Recursive Union  (cost=0.00..13.84 rows=101 width=112) (actual time=0.911..249295.872 rows=1233081 loops=1)
           ->  Limit  (cost=0.00..0.04 rows=1 width=34) (actual time=0.910..0.911 rows=1 loops=1)
                 ->  Index Only Scan using tablename_sid on tablename  (cost=0.00..965442.32 rows=22024248 width=34) (actual time=0.908..0.908 rows=1 loops=1)
                       Heap Fetches: 0
           ->  WorkTable Scan on t  (cost=0.00..1.18 rows=10 width=112) (actual time=0.201..0.201 rows=1 loops=1233081)
                 Filter: (source IS NOT NULL)
                 Rows Removed by Filter: 0
                 SubPlan 1
                   ->  Limit  (cost=0.00..0.05 rows=1 width=34) (actual time=0.100..0.100 rows=1 loops=1233080)
                         ->  Index Only Scan using tablename_sid on tablename  (cost=0.00..340173.38 rows=7341416 width=34) (actual time=0.100..0.100 rows=1 loops=1233080)
                               Index Cond: (ROW(source, id) > ROW(t.source, t.id))
                               Heap Fetches: 0
                 SubPlan 2
                   ->  Limit  (cost=0.00..0.05 rows=1 width=34) (actual time=0.099..0.099 rows=1 loops=1233080)
                         ->  Index Only Scan using tablename_sid on tablename  (cost=0.00..340173.38 rows=7341416 width=34) (actual time=0.098..0.098 rows=1 loops=1233080)
                               Index Cond: (ROW(source, id) > ROW(t.source, t.id))
                               Heap Fetches: 0
 Total runtime: 250491.559 ms

Merge Anti Join

Merge Anti Join  (cost=0.00..12099015.26 rows=14682832 width=42) (actual time=48.710..541624.677 rows=1233080 loops=1)
   Merge Cond: ((src.source = nx.source) AND (src.id = nx.id))
   Join Filter: (nx.time_ > src.time_)
   Rows Removed by Join Filter: 363464177
   ->  Index Only Scan using tablename_pkey on tablename src  (cost=0.00..1060195.27 rows=22024248 width=42) (actual time=48.566..5064.551 rows=22024982 loops=1)
         Heap Fetches: 0
   ->  Materialize  (cost=0.00..1115255.89 rows=22024248 width=42) (actual time=0.011..40551.997 rows=363464177 loops=1)
         ->  Index Only Scan using tablename_pkey on tablename nx  (cost=0.00..1060195.27 rows=22024248 width=42) (actual time=0.008..8258.890 rows=22024982 loops=1)
               Heap Fetches: 0
 Total runtime: 541750.026 ms

887

asked Dec 05 '13 05:12

AI1

2 Answers

Rather hideous, but this seems to work:

WITH RECURSIVE
     t AS (
  select a,b from (select a,b from foo order by a,b limit 1) asdf union all 
  select (select a from foo where (a,b) > (t.a,t.b) order by a,b limit 1),
         (select b from foo where (a,b) > (t.a,t.b) order by a,b limit 1) 
     from t where t.a is not null)  
select * from t where t.a is not null;

I don't really understand why the "is not nulls" are needed, as where do the nulls come from in the first place?

102

answered Sep 21 '22 15:09

jjanes

DROP SCHEMA zooi CASCADE;
CREATE SCHEMA zooi ;
SET search_path=zooi,public,pg_catalog;

CREATE TABLE tablename
  ( source character(25) NOT NULL
  , id bigint NOT NULL
  , time_ timestamp without time zone NOT NULL
  , height numeric
  , lon numeric
  , lat numeric
  , distance numeric
  , status character(3)
  , geom geometry(PointZ,4326)
  , relid bigint
        , PRIMARY KEY (source,id,time_) -- <<-- Primary key here
  ) WITH ( OIDS=FALSE);


  -- invent some bogus data
INSERT INTO tablename(source,id,time_)
SELECT 'SRC_'|| (gs%10)::text
        ,gs/10
        ,gt
FROM generate_series(1,1000) gs
        , generate_series('2013-12-01', '2013-12-07', '1hour'::interval) gt
        ;

Select unique values for two key fields:

VACUUM ANALYZE tablename;

EXPLAIN ANALYZE
SELECT source,id,time_
FROM tablename src
WHERE NOT EXISTS (
        SELECT * FROM tablename nx
        WHERE nx.source =src.source
        AND nx.id = src.id
        AND time_  > src.time_
        )
        ;

Generates this plan here (Pg-9.3):

                                                     QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Hash Anti Join  (cost=4981.00..12837.82 rows=96667 width=42) (actual time=547.218..1194.335 rows=1000 loops=1)
   Hash Cond: ((src.source = nx.source) AND (src.id = nx.id))
   Join Filter: (nx.time_ > src.time_)
   Rows Removed by Join Filter: 145000
   ->  Seq Scan on tablename src  (cost=0.00..2806.00 rows=145000 width=42) (actual time=0.010..210.810 rows=145000 loops=1)
   ->  Hash  (cost=2806.00..2806.00 rows=145000 width=42) (actual time=546.497..546.497 rows=145000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 9063kB
         ->  Seq Scan on tablename nx  (cost=0.00..2806.00 rows=145000 width=42) (actual time=0.006..259.864 rows=145000 loops=1)
 Total runtime: 1197.374 ms
(9 rows)

The hash-joins will probably disappear once the data outgrows the work_mem:

 Merge Anti Join  (cost=0.83..8779.56 rows=29832 width=120) (actual time=0.981..2508.912 rows=1000 loops=1)
   Merge Cond: ((src.source = nx.source) AND (src.id = nx.id))
   Join Filter: (nx.time_ > src.time_)
   Rows Removed by Join Filter: 184051
   ->  Index Scan using tablename_sid on tablename src  (cost=0.41..4061.57 rows=32544 width=120) (actual time=0.055..250.621 rows=145000 loops=1)
   ->  Index Scan using tablename_sid on tablename nx  (cost=0.41..4061.57 rows=32544 width=120) (actual time=0.008..603.403 rows=328906 loops=1)
 Total runtime: 2510.505 ms

answered Sep 21 '22 15:09

joop

Related questions
                            
                                Spring Data Rest - PATCH Postgres jsonb field
                            
                                Run docker image with docker-compose
                            
                                Pass date parameter to native query
                            
                                R RPostgreSQL Connect to Remote Postgres Database with SSL
                            
                                Rails add_index algorithm: :concurrently still causes database lock up during migration
                            
                                Array difference in postgresql
                            
                                How to return dictonary or json if I use psycopg2?
                            
                                How to prevent or avoid running update and delete statements without where clauses in PostgreSQL
                            
                                Select specific columns from jeft join query, TypeORM
                            
                                Is it possible to use MongoDb and PostgreSql for same model in Spring boot?
                            
                                Is it safe to set MySQL isolation to “Read Uncommitted” (dirty reads) for typical Web usage? Even with replication?
                            
                                PostgreSQL throws error: "ERROR: permission denied for relation table_name"
                            
                                Unique constraint over multiple tables
                            
                                Liquibase: relation "databasechangeloglock" already exists, using grails plugin and non-default schema
                            
                                ActiveRecord::StatementInvalid: PG::Error: ERROR: cannot execute UPDATE in a read-only transaction error in Heroku
                            
                                Postgres WARNING: errors ignored on restore: 59
                            
                                FSharp data type provider for Postgresql
                            
                                Can the Postgres array overlap (&&) operator use an index?
                            
                                password authentication failed for user "postgres" on mac
                            
                                Rails 4 ActiveRecord throws PG::UnableToSend on Ubuntu 13.04

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With