Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare two columns to find unmatched records in MySQL

Tags:

mysql

I have a MySQL table with 2 columns and each column has thousands of records

For Example 15000 Email addresses in Column1 and 15005 Email addresses in column 2

How to find those 5 records from 15005 which are unmatched in column1?

I wish MySql query to compare both columns and give result of only 5 unmatched records

Thanks

like image 673
ronniekap Avatar asked Oct 02 '11 18:10

ronniekap


Video Answer


2 Answers

Not sure if I got it right... but would it be something like?

select column2 from table
where column2 not in (select column1 from table)
like image 94
Mosty Mostacho Avatar answered Nov 04 '22 21:11

Mosty Mostacho


Richard, it's highly unusual to find matching/missing rows from one column in a table compared against another column in the same table.

You can think of a table as being a collection of facts, with each row being one fact. Converting values into predicates is how we understand the data. The value "12" in one table may mean "there exists a day on which 12 widgets were made," or "12 people bought widgets on Jan. 1," or "on Jan. 12, no widgets were sold," but whatever the table's corresponding predicate is, "12" should represent a fact.

It's common to want to find the difference between two tables: "what facts are in B that aren't in A?" But in a table with two columns, each row should conceptually be a fact about that pair of values. Perhaps the predicate for the row (12, 13) might be "on Jan. 12, we sold 13 widgets." But in that case I doubt you'd be asking for this information.

So, if (12,13) is really two of the same predicate -- "someone in district 12 bought widgets, and also, someone in district 13 bought widgets" -- in the long run life will be easier if those are one column, not two. And if it's two different predicates, it would make more sense for them to be in two tables. SQL's flexible and can handle these situations, but you may run into more problems later. If you're interested in more about this subject, searching on "normalization" will find you way more than you want to know :)

Anyway, I think the query you're looking for uses a LEFT JOIN to compare the table against itself. I added the values 1-15000 to col1 and 1-15005 to col2 in this table:

CREATE TABLE `foo` (
  `col1` int(11) DEFAULT NULL,
  `col2` int(11) DEFAULT NULL,
  KEY `idx_col1` (`col1`),
  KEY `idx_col2` (`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

mysql> select count(distinct col1), count(distinct col2) from foo;
+----------------------+----------------------+
| count(distinct col1) | count(distinct col2) |
+----------------------+----------------------+
|                15000 |                15005 |
+----------------------+----------------------+
1 row in set (0.01 sec)

By giving the same table two names, I can compare its two columns against each other, and find the col2 values that have no corresponding col1 values -- in those cases, f1.col1 will be NULL:

mysql> select f2.col2
from foo as f2 left join foo as f1 on (f2.col2=f1.col1)
where f1.col1 is null;
+-------+
| col2  |
+-------+
| 15001 |
| 15002 |
| 15003 |
| 15004 |
| 15005 |
+-------+
5 rows in set (0.03 sec)

Regarding Mosty's solution yesterday, I'm not sure it's correct. I try not to use subqueries, so I'm a little out of my depth here. But it doesn't seem to work for at least my attempt to replicate your data set:

mysql> select col2 from foo where col2 not in
(select col1 from foo);
Empty set (0.02 sec)

It works if I exclude the 5 NULLs from the subquery, which suggests to me that "NOT IN (NULL)" doesn't necessarily work the way one might think it works:

mysql> select col2 from foo where col2 not in
(select col1 from foo where col1 is not null);
+-------+
| col2  |
+-------+
| 15001 |
| 15002 |
| 15003 |
| 15004 |
| 15005 |
+-------+
5 rows in set (0.02 sec)

The main reason I avoid subqueries in MySQL is that they have unpredictable performance characteristics, or at least, complex enough that I can't predict them. For more information, see the "O(MxN)" comment in http://dev.mysql.com/doc/refman/5.5/en/subquery-restrictions.html and the advice on the short webpage http://dev.mysql.com/doc/refman/5.5/en/rewriting-subqueries.html .

like image 24
Jamie McCarthy Avatar answered Nov 04 '22 21:11

Jamie McCarthy