Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows where ColA=ColB and also add a new column that tells whether the original data in ColA is repeated

I am trying to figure out how to query a table (the table is actually a result set, so it will be a subquery), group it by ColA=ColB (see below), and create a calculated field all in one step.

So, if my test data looks like

ColA    ColB    ColC 
1       1       aaa 
1       2       bbbb    
1       3       cccc    
2       2       dddd    
3       3       eeee    
3       4       ffff    
3       5       gggg    
3       6       hhhh    
4       4       iiii    
5       5       jjjj    
6       6       kkkk    
6       7       llll    
6       8       mmmm    

I would like to retrieve only rows where ColA=ColB and also add a new column that tells me whether the original data in ColA was repeated. See below.

ColA   ColB    ColC    multiples
1      1       aaaa    yes
2       2       dddd    no
3       3       eeee    yes
4       4       iiii    no
5       5       jjjj    no
6       6       kkkk    yes

Can someone help me out with the syntax? I have been playing with Group By's and SubSelects to no avail. Do I need to use a case statement to for the multiples field?

like image 951
Shereif Avatar asked Dec 08 '25 23:12

Shereif


2 Answers

It's more helpful to post create table and insert statements instead of Desc table and select * from table_name; http://tkyte.blogspot.com/2005/06/how-to-ask-questions.html

create table test_repeat(
   cola number,
   colb number,
   colc varchar2(20)
   );
     
insert into test_repeat values (1,1,'aaa'); 
insert into test_repeat values (1,2,'bbbb');    
insert into test_repeat values (1,3,'cccc');    
insert into test_repeat values (2,2,'dddd');    
insert into test_repeat values (3,3,'eeee'); 
insert into test_repeat values (3,4,'ffff');    
insert into test_repeat values (3,5,'gggg');    
insert into test_repeat values (3,6,'hhhh');    
insert into test_repeat values (4,4,'iiii');    
insert into test_repeat values (5,5,'jjjj');    
insert into test_repeat values (6,6,'kkkk');    
insert into test_repeat values (6,7,'llll');    
insert into test_repeat values (6,8,'mmmm');
commit;
  1. You can use the Oracle analytic function Lead to look through your result set to see if colA is the same for the next row (after ordering it..) like..

    select * from (select colA, colb, (case when colA = (lead(cola) over (partition by colA order by cola, colb)) then 'Yes' else 'No' end) multiples, colc from test_repeat) where colA = colb /

       COLA       COLB MUL COLC
    

          1          1 Yes aaa
          2          2 No  dddd
          3          3 Yes eeee
          4          4 No  iiii
          5          5 No  jjjj
          6          6 Yes kkkk
    
  2. Or you can get the count for each value of COLA and compare it to see if there are duplicates...

    select a.colA, a.colb, a.colc, (case when (select count(*) from test_repeat t where t.cola = a.colA) > 1 then 'Yes' else 'No' end) Repeat from test_repeat a where colA = colB /

       COLA       COLB COLC                 REP
    

          1          1 aaa                  Yes
          2          2 dddd                 No
          3          3 eeee                 Yes
          4          4 iiii                 No
          5          5 jjjj                 No
          6          6 kkkk                 Yes
    

They are both equally simple, but I would suggest the analytic function approach as I have found it to be generally faster for all Queries I have worked with in the past.

like image 61
Rajesh Chamarthi Avatar answered Dec 11 '25 12:12

Rajesh Chamarthi


SQL> select *
  2    from test_repeat
  3   order by cola
  4  /

      COLA       COLB COLC
---------- ---------- --------------------
         1          2 bbbb
         1          1 aaa
         1          3 cccc
         2          2 dddd
         3          4 ffff
         3          3 eeee
         3          5 gggg
         3          6 hhhh
         4          4 iiii
         5          5 jjjj
         6          6 kkkk
         6          7 llll
         6          8 mmmm
         7          9 nnnn

14 rows selected.

SQL> select cola "ColA"
  2       , max(decode(colb,cola,colb)) "ColB"
  3       , max(decode(colb,cola,colc)) "ColC"
  4       , case count(*) when 1 then 'no' else 'yes' end "multiples"
  5    from test_repeat
  6   group by cola
  7  having cola = max(decode(colb,cola,colb))
  8   order by cola
  9  /

      ColA       ColB ColC                 mul
---------- ---------- -------------------- ---
         1          1 aaa                  yes
         2          2 dddd                 no
         3          3 eeee                 yes
         4          4 iiii                 no
         5          5 jjjj                 no
         6          6 kkkk                 yes

6 rows selected.

Performancewise, this SQL is approximately equal to Rajesh' first query. So you can choose the one you are more comfortable with.

SQL> set autotrace on
SQL> select * from
  2  (select colA, colb,
  3         (case when colA = (lead(cola) over
  4                            (partition by colA order by cola, colb))
  5              then 'Yes'
  6              else 'No'
  7         end) multiples,
  8         colc
  9    from test_repeat)
 10    where colA = colb
 11  /

      COLA       COLB MUL COLC
---------- ---------- --- --------------------
         1          1 Yes aaa
         2          2 No  dddd
         3          3 Yes eeee
         4          4 No  iiii
         5          5 No  jjjj
         6          6 Yes kkkk

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1491815685

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |    14 |   574 |     4  (25)| 00:00:01 |
|*  1 |  VIEW               |             |    14 |   574 |     4  (25)| 00:00:01 |
|   2 |   WINDOW SORT       |             |    14 |   532 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST_REPEAT |    14 |   532 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COLA"="COLB")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
        421  bytes sent via SQL*Net to client
        238  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          6  rows processed

SQL> /

      COLA       COLB MUL COLC
---------- ---------- --- --------------------
         1          1 Yes aaa
         2          2 No  dddd
         3          3 Yes eeee
         4          4 No  iiii
         5          5 No  jjjj
         6          6 Yes kkkk

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1491815685

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |    14 |   574 |     4  (25)| 00:00:01 |
|*  1 |  VIEW               |             |    14 |   574 |     4  (25)| 00:00:01 |
|   2 |   WINDOW SORT       |             |    14 |   532 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST_REPEAT |    14 |   532 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COLA"="COLB")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        421  bytes sent via SQL*Net to client
        238  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          6  rows processed

SQL> select cola "ColA"
  2       , max(decode(colb,cola,colb)) "ColB"
  3       , max(decode(colb,cola,colc)) "ColC"
  4       , case count(*) when 1 then 'no' else 'yes' end "multiples"
  5    from test_repeat
  6   group by cola
  7  having cola = max(decode(colb,cola,colb))
  8   order by cola
  9  /

      ColA       ColB ColC                 mul
---------- ---------- -------------------- ---
         1          1 aaa                  yes
         2          2 dddd                 no
         3          3 eeee                 yes
         4          4 iiii                 no
         5          5 jjjj                 no
         6          6 kkkk                 yes

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3021378319

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |    14 |   532 |     4  (25)| 00:00:01 |
|*  1 |  FILTER             |             |       |       |            |          |
|   2 |   SORT GROUP BY     |             |    14 |   532 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST_REPEAT |    14 |   532 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COLA"=MAX(DECODE("COLB","COLA","COLB")))

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        421  bytes sent via SQL*Net to client
        238  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          6  rows processed

SQL> /

      ColA       ColB ColC                 mul
---------- ---------- -------------------- ---
         1          1 aaa                  yes
         2          2 dddd                 no
         3          3 eeee                 yes
         4          4 iiii                 no
         5          5 jjjj                 no
         6          6 kkkk                 yes

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3021378319

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |    14 |   532 |     4  (25)| 00:00:01 |
|*  1 |  FILTER             |             |       |       |            |          |
|   2 |   SORT GROUP BY     |             |    14 |   532 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST_REPEAT |    14 |   532 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COLA"=MAX(DECODE("COLB","COLA","COLB")))

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        420  bytes sent via SQL*Net to client
        238  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          6  rows processed

SQL> set autotrace off
like image 44
Rob van Wijk Avatar answered Dec 11 '25 11:12

Rob van Wijk



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!