Say we have a table: <pre class="prettyprint"><code>CREATE TABLE p ( id serial NOT NULL, val boolean NOT NULL, PRIMARY KEY (id) ); </code></pre> Populated with some rows: <pre class="prettyprint"><code>insert into p (val) values (true),(false),(false),(true),(true),(true),(false); </code></pre> <pre class="prettyprint"> ID VAL 1 1 2 0 3 0 4 1 5 1 6 1 7 0 </pre> I want to determine when the value has been changed. So the result of my query should be: <pre class="prettyprint"> ID VAL 2 0 4 1 7 0 </pre> I have a solution with joins and subqueries: <pre class="prettyprint"><code>select min(id) id, val from ( select p1.id, p1.val, max(p2.id) last_prev from p p1 join p p2 on p2.id < p1.id and p2.val != p1.val group by p1.id, p1.val ) tmp group by val, last_prev order by id; </code></pre> But it is very inefficient and will work extremely slow for tables with many rows. I believe there could be more efficient solution using PostgreSQL window functions? SQL Fiddle

<h3>Window function</h3> Instead of calling <code>COALESCE</code>, you can provide a default from the window function <code>lag()</code> directly. A minor detail in this case since all columns are defined <code>NOT NULL</code>. But this may be essential to distinguish "no previous row" from "NULL in previous row". <pre class="prettyprint"><code>SELECT id, val FROM ( SELECT id, val, lag(val, 1, val) OVER (ORDER BY id) <> val AS changed FROM p ) sub WHERE changed ORDER BY id;</code></pre> Compute the result of the comparison immediately, since the previous value is not of interest per se, only a possible change. Shorter and may be a tiny bit faster. If you consider the first row to be "changed" (unlike your demo output suggests), you need to observe <code>NULL</code> values - even though your columns are defined <code>NOT NULL</code>. Basic <code>lag()</code> returns <code>NULL</code> in case there is no previous row: <pre class="prettyprint"><code>SELECT id, val FROM ( SELECT id, val, lag(val) OVER (ORDER BY id) IS DISTINCT FROM val AS changed FROM p ) sub WHERE changed ORDER BY id; </code></pre> Or employ the additional parameters of <code>lag()</code> once again: <pre class="prettyprint"><code>SELECT id, val FROM ( SELECT id, val, lag(val, 1, NOT val) OVER (ORDER BY id) <> val AS changed FROM p ) sub WHERE changed ORDER BY id;</code></pre> <h3>Recursive CTE</h3> As proof of concept. :) Performance won't keep up with posted alternatives. <pre class="prettyprint"><code>WITH RECURSIVE cte AS ( SELECT id, val FROM p WHERE NOT EXISTS ( SELECT 1 FROM p p0 WHERE p0.id < p.id ) UNION ALL SELECT p.id, p.val FROM cte JOIN p ON p.id > cte.id AND p.val <> cte.val WHERE NOT EXISTS ( SELECT 1 FROM p p0 WHERE p0.id > cte.id AND p0.val <> cte.val AND p0.id < p.id ) ) SELECT * FROM cte; </code></pre> With an improvement from @wildplasser. SQL Fiddle demonstrating all.

PostgreSQL - column value changed - select query optimization

Tags:

sql

postgresql

window-functions

gaps-and-islands

Say we have a table:

CREATE TABLE p
(
   id serial NOT NULL, 
   val boolean NOT NULL, 
   PRIMARY KEY (id)
);

Populated with some rows:

insert into p (val)
values (true),(false),(false),(true),(true),(true),(false);

I want to determine when the value has been changed. So the result of my query should be:

I have a solution with joins and subqueries:

select min(id) id, val from
(
  select p1.id, p1.val, max(p2.id) last_prev
  from p p1
  join p p2
    on p2.id < p1.id and p2.val != p1.val
  group by p1.id, p1.val
) tmp
group by val, last_prev
order by id;

But it is very inefficient and will work extremely slow for tables with many rows.
I believe there could be more efficient solution using PostgreSQL window functions?

SQL Fiddle

621

asked Jun 07 '14 15:06

Nailgun

1 Answers

Window function

Instead of calling COALESCE, you can provide a default from the window function lag() directly. A minor detail in this case since all columns are defined NOT NULL. But this may be essential to distinguish "no previous row" from "NULL in previous row".

SELECT id, val
FROM  (
   SELECT id, val, lag(val, 1, val) OVER (ORDER BY id) <> val AS changed
   FROM   p
   ) sub
WHERE  changed
ORDER  BY id;

Compute the result of the comparison immediately, since the previous value is not of interest per se, only a possible change. Shorter and may be a tiny bit faster.

If you consider the first row to be "changed" (unlike your demo output suggests), you need to observe NULL values - even though your columns are defined NOT NULL. Basic lag() returns NULL in case there is no previous row:

SELECT id, val
FROM  (
   SELECT id, val, lag(val) OVER (ORDER BY id) IS DISTINCT FROM val AS changed
   FROM   p
   ) sub
WHERE  changed
ORDER  BY id;

Or employ the additional parameters of lag() once again:

SELECT id, val
FROM  (
   SELECT id, val, lag(val, 1, NOT val) OVER (ORDER BY id) <> val AS changed
   FROM   p
   ) sub
WHERE  changed
ORDER  BY id;

Recursive CTE

As proof of concept. :) Performance won't keep up with posted alternatives.

WITH RECURSIVE cte AS (
   SELECT id, val
   FROM   p
   WHERE  NOT EXISTS (
      SELECT 1
      FROM   p p0
      WHERE  p0.id < p.id
      )
  
   UNION ALL
   SELECT p.id, p.val
   FROM   cte
   JOIN   p ON p.id   > cte.id
           AND p.val <> cte.val
   WHERE NOT EXISTS (
     SELECT 1
     FROM   p p0
     WHERE  p0.id   > cte.id
     AND    p0.val <> cte.val
     AND    p0.id   < p.id
     )
  )
SELECT * FROM cte;

With an improvement from @wildplasser.

SQL Fiddle demonstrating all.

answered Sep 23 '22 00:09

Erwin Brandstetter

Related questions
                            
                                How to use SELECT DISTINCT and CONCAT in the same SQL statement
                            
                                SQL: Ternary operations
                            
                                SQL frequency distribution query to count ranges with group-by and include 0 counts
                            
                                Select before and after rows around id with mysql
                            
                                Is it possible to reference a different column in the same table?
                            
                                SQL Server Rank() by group
                            
                                is insert based on select on one of the column in MySQL possible?
                            
                                Dapper. Execute Query with GOs
                            
                                INSERT deleted values into a table before DELETE with a DELETE TRIGGER
                            
                                Aliasing a FOR XML PATH result
                            
                                Select For Update statement in PostgreSql
                            
                                MySQL is converting my timestamp values to 0000-00-00
                            
                                Find if a column has unique constraint
                            
                                Can CHECK constraints act like if else?
                            
                                Does 'select distinct' returns the first distinct value or last distinct?
                            
                                Combining concatenation with ORDER BY
                            
                                Modeling this in a SQLite relational database
                            
                                Change column type from timestamp WITHOUT time zone to timestamp WITH time zone
                            
                                How to Add Long Text Column to Access Table Via Query
                            
                                MySQL Group By And Skip Grouping On Null Values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With