I am trying to select duplicate records based on a match of three columns. The list of triples could be very long (1000), so I would like to make it concise.
When I have a list of size 10 (known duplicates) it only matches 2 (seemingly random ones) and misses the other 8. I expected 10 records to return, but only saw 2.
I've narrowed it down to this problem:
This returns one record. Expecting 2:
select * 
from ali
where (accountOid, dt, x) in
(
  (64, '2014-03-01', 10000.0), 
  (64, '2014-04-23', -122.91)
)
Returns two records, as expected:
select * 
from ali
where (accountOid, dt, x) in ( (64, '2014-03-01', 10000.0) )
or (accountOid, dt, x) in ( (64, '2014-04-23', -122.91) )
Any ideas why the first query only returns one record?
The WHERE clause specifies a condition for rows (or row combinations) in these tables that are considered in this query (absence of C is equivalent to C ≡ TRUE). The SELECT clause specifies the attributes of the result tuples (* ≡ output all attributes occurring in R1,...,Rm).
Multiple WHERE Clause Conditionals Using the MySQL AND Logical Operator. Oftentimes, you need multiple filter conditionals in a WHERE clause in order to target specific rows of data.
GROUP BY: This requires one (or multiple) column names based on which we want the results to be aggregated. [HAVING condition]: This is an optional condition that could be specified for SELECT queries that use the GROUP BY clause. It's similar to a WHERE clause in a normal SELECT query.
I'd suggest you don't use IN() for this, instead use a where exists query, e.g.:
CREATE TABLE inlist
    (`id` int, `accountOid` int, `dt` datetime, `x` decimal(18,4))
;
INSERT INTO inlist
    (`id`, `accountOid`, `dt`, `x`)
VALUES
    (1, 64, '2014-03-01 00:00:00', 10000.0),
    (2, 64, '2014-04-23 00:00:00', -122.91)
;
select *
from ali
where exists ( select null
               from inlist
               where ali.accountOid = inlist.accountOid
               and ali.dt = inlist.dt
               and ali.x = inlist.x
             )
;
I was able to reproduce a problem (compare http://sqlfiddle.com/#!2/7d2658/6 to http://sqlfiddle.com/#!2/fe851/1 both MySQL 5.5.3) where if the x column was numeric and the value negative it was NOT matched using IN() but was matched when either numeric or decimal using a table and where exists.
Perhaps not a conclusive test but personally I wouldn't have used IN() for this anyway.
Why are you not determining the duplicates this way?
select
        accountOid
      , dt
      , x
from ali
group by
        accountOid
      , dt
      , x
having
        count(*) > 1
Then use that as a derived table within the where exists condition:
select *
from ali
where exists (
               select null
               from (
                      select
                              accountOid
                            , dt
                            , x
                      from ali
                      group by
                              accountOid
                            , dt
                            , x
                      having
                              count(*) > 1
                     ) as inlist
               where ali.accountOid = inlist.accountOid
               and ali.dt = inlist.dt
               and ali.x = inlist.x
             )
see http://sqlfiddle.com/#!2/ede292/1 for the query immediately above
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With