Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: filtering rows

I am trying to write a SQL query that returns rows from a table containing data:

The table structure is as follows:

CREATE TABLE person(
    id INT PRIMARY KEY,
    name TEXT,
    operation TEXT);

I want to return all unique rows of name which have not been "cancelled" out. A row is considered to be "cancelled" out if the operation is either "insert" or "delete" and there exists another row with the same name with the opposite operation.

For example, if I have the following rows

id   name   operation
1    bob    insert
2    bob    delete
3    bob    insert

The first 2 rows "cancel" each other out because they share the same name with opposite operations. So the query should return row 3.

Here is another example:

id   name   operation
1    bob    insert
2    bob    delete
3    bob    insert
4    bob    delete

In this case, rows 1 and 2 cancel out, and rows 3 and 4 cancel out. So the query should not return any rows.

Last example:

id   name   operation
1    bob    insert
2    bob    insert

In this case, rows 1 and 2 do not cancel out because the operations are not opposite. So the query should return both rows.

I have the following query which handles the first two scenarios, but it doesn't handle the final scenario.

Does anyone have any suggestion for a query that can handle all 3 scenarios?

SELECT MAX(id),name 
FROM person z 
WHERE operation IN ('insert','delete') 
GROUP BY name 
HAVING count(1) % 2 = 1;
like image 932
Jin Kim Avatar asked Jan 11 '12 17:01

Jin Kim


1 Answers

One way is to compare the operation counts. Since you'll also need to get the number of INSERTS or DELETES that correspond with InsertCount - deleteCount or InsertCount - deleteCount and since PostgreSQL supports window function you should be able to use row_number().

Note: I've not tested this so but according to this PostgreSQL manual Chapter 3. Advanced Features, 3.5 Window functions you can refer to a Window Function in an inline query

SELECT
       id, name
FROM
   (
    SELECT 
            row_number() over (partition by p.name, p.operation order by p.id desc) rn , 
            id,  
            p.Name,
            p.operation, 
            operationCounts.InsertCount,
            operationCounts.deleteCount

    FROM 
       Person p
    INNER JOIN (

        SELECT 
          SUM(CASE WHEN operation = 'insert' then 1 else 0 END) InsertCount,
          SUM(CASE WHEN operation = 'delete' then 1 else 0 END) deleteCount,
          name 
        FROM 
           person 
        GROUP BY
           name ) operationCounts
    ON p.name = operationCounts.name
    WHERE 
      operationCounts.InsertCount <> operationCounts.deleteCount) data
WHERE
      (rn <=  (InsertCount -  deleteCount)
      and operation = 'insert')
      OR
     (rn <=  (deleteCount -  InsertCount)
      and operation = 'delete')
like image 56
Conrad Frix Avatar answered Sep 26 '22 14:09

Conrad Frix