Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySql IN clauses, trying to match IN list of tuples

I am trying to select duplicate records based on a match of three columns. The list of triples could be very long (1000), so I would like to make it concise.

When I have a list of size 10 (known duplicates) it only matches 2 (seemingly random ones) and misses the other 8. I expected 10 records to return, but only saw 2.

I've narrowed it down to this problem:

This returns one record. Expecting 2:

select * 
from ali
where (accountOid, dt, x) in
(
  (64, '2014-03-01', 10000.0), 
  (64, '2014-04-23', -122.91)
)

Returns two records, as expected:

select * 
from ali
where (accountOid, dt, x) in ( (64, '2014-03-01', 10000.0) )
or (accountOid, dt, x) in ( (64, '2014-04-23', -122.91) )

Any ideas why the first query only returns one record?

like image 680
user3877299 Avatar asked Jul 25 '14 14:07

user3877299


People also ask

Which clause in SQL is used to specify the number of tuples present in a database table?

The WHERE clause specifies a condition for rows (or row combinations) in these tables that are considered in this query (absence of C is equivalent to C ≡ TRUE). The SELECT clause specifies the attributes of the result tuples (* ≡ output all attributes occurring in R1,...,Rm).

Can you have multiple WHERE clauses in MySQL?

Multiple WHERE Clause Conditionals Using the MySQL AND Logical Operator. Oftentimes, you need multiple filter conditionals in a WHERE clause in order to target specific rows of data.

Can we use group by with WHERE clause in MySQL?

GROUP BY: This requires one (or multiple) column names based on which we want the results to be aggregated. [HAVING condition]: This is an optional condition that could be specified for SELECT queries that use the GROUP BY clause. It's similar to a WHERE clause in a normal SELECT query.


1 Answers

I'd suggest you don't use IN() for this, instead use a where exists query, e.g.:

CREATE TABLE inlist
    (`id` int, `accountOid` int, `dt` datetime, `x` decimal(18,4))
;

INSERT INTO inlist
    (`id`, `accountOid`, `dt`, `x`)
VALUES
    (1, 64, '2014-03-01 00:00:00', 10000.0),
    (2, 64, '2014-04-23 00:00:00', -122.91)
;

select *
from ali
where exists ( select null
               from inlist
               where ali.accountOid = inlist.accountOid
               and ali.dt = inlist.dt
               and ali.x = inlist.x
             )
;

I was able to reproduce a problem (compare http://sqlfiddle.com/#!2/7d2658/6 to http://sqlfiddle.com/#!2/fe851/1 both MySQL 5.5.3) where if the x column was numeric and the value negative it was NOT matched using IN() but was matched when either numeric or decimal using a table and where exists.

Perhaps not a conclusive test but personally I wouldn't have used IN() for this anyway.

Why are you not determining the duplicates this way?

select
        accountOid
      , dt
      , x
from ali
group by
        accountOid
      , dt
      , x
having
        count(*) > 1

Then use that as a derived table within the where exists condition:

select *
from ali
where exists (
               select null
               from (
                      select
                              accountOid
                            , dt
                            , x
                      from ali
                      group by
                              accountOid
                            , dt
                            , x
                      having
                              count(*) > 1
                     ) as inlist
               where ali.accountOid = inlist.accountOid
               and ali.dt = inlist.dt
               and ali.x = inlist.x
             )

see http://sqlfiddle.com/#!2/ede292/1 for the query immediately above

like image 148
Paul Maxwell Avatar answered Nov 06 '22 19:11

Paul Maxwell