Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sql avoid cartesian product

Tags:

sql

I'm pretty new to SQL and am struggling with a query (using Access, FWIW). I have Googled and searched StackOverflow, but I haven't seen this exact scenario. (That could also be because I don't know the correct search terms.)

I have two pretty simple tables that contain similar data.

table1: state, lname, fname, network
table2: state, lname, fname, network

What I want is to find each person/state combo that match in the two tables plus the networks from each table that the person is in:

state, lname, fname, t1.network, t2.network.  

The person may be in more than one network in each table. I want to see each network (from both tables) that the person belongs to.

I started by using a JOIN as below:

SELECT t1.state, t1.lname, t1.fname, t1.network, t2.network
FROM t1 INNER JOIN t2 
ON t1.fname=t2.fname AND t1.lname=t2.lname AND t1.state=t2.state
GROUP BY t1.state, t1.lname, t1.fname, t1.network, t2.network

I quickly figured out that I get a Cartesian product. So if "NY, Smith, John" was in two networks in t1 and three networks in t2 I would get something like this:

NY, Smith, John, NetworkA, NetworkB
NY, Smith, John, NetworkA, NetworkA
NY, Smith, John, NetworkB, NetworkA
NY, Smith, John, NetworkB, NetworkB
NY, Smith, John, NetworkA, NetworkC
NY, Smith, John, NetworkB, NetworkC

What I really want to see is just:

NY, Smith, John, NetworkA, NetworkA
NY, Smith, John, NetworkB, NetworkB
NY, Smith, John, NULL, NetworkC

Can anyone give me some advice on how to proceed or point me in the right direction?

like image 517
user2014025 Avatar asked Jan 26 '13 19:01

user2014025


People also ask

How avoid Cartesian join in SQL?

To avoid a Cartesian product, you must specify how the tables should be combined. Typically, you want to pair rows based on matching values in one or more key columns of each table.

How many JOINs are required to avoid Cartesian product?

The following method may be used to avoid Cartesian products when you create a join. Usually you should have at least t-1 join conditions for a join between t tables.

How do you handle Cartesian product in SQL?

In SQL Server, the cartesian product is really a cross-join which returns all the rows in all the tables listed in a query: each row in the first table is paired with all the rows in the second table. This happens when there is no relationship defined between the two tables.

What causes Cartesian product in SQL?

A Cartesian product is the result of joining every row in one table with every row in another table. This occurs when there is no WHERE clause to restrict rows.


1 Answers

So it looks like you want all records from each of tables that are identical, and then only those from each that are distinct. That means you need to UNION 3 sets of queries.

Try something like this:

SELECT t1.state, 
   t1.lname, 
   t1.fname, 
   t1.network as t1Network, 
   t2.network as t2Network
FROM table1 t1 
   INNER JOIN table2 t2 
      ON t1.fname=t2.fname 
      AND t1.lname=t2.lname 
      AND t1.state=t2.state
      AND t1.network=t2.network
UNION 
SELECT t1.state, 
   t1.lname, 
   t1.fname, 
   t1.network as t1Network, 
   t2.network as t2Network
FROM table1 t1 
   LEFT JOIN table2 t2 
      ON t1.fname=t2.fname 
      AND t1.lname=t2.lname 
      AND t1.state=t2.state
      AND t1.network=t2.network
WHERE t2.network IS NULL
UNION 
SELECT t2.state, 
   t2.lname, 
   t2.fname, 
   t1.network as t1Network, 
   t2.network as t2Network
FROM table2 t2 
   LEFT JOIN table1 t1
      ON t1.fname=t2.fname 
      AND t1.lname=t2.lname 
      AND t1.state=t2.state
      AND t1.network=t2.network
WHERE t1.network IS NULL

This should give you your desired results.

And here is the SQL Fiddle to confirm.

--EDIT

Not thinking today -- you don't really need that first query. You can remove the WHERE condition from the 2nd query and it works the same way. Tired :-)

Here is the updated query -- both should work just fine though, this is just easier to read:

SELECT t1.state, 
   t1.lname, 
   t1.fname, 
   t1.network as t1Network, 
   t2.network as t2Network
FROM table1 t1 
   LEFT JOIN table2 t2 
      ON t1.fname=t2.fname 
      AND t1.lname=t2.lname 
      AND t1.state=t2.state
      AND t1.network=t2.network
UNION 
SELECT t2.state, 
   t2.lname, 
   t2.fname, 
   t1.network as t1Network, 
   t2.network as t2Network
FROM table2 t2 
   LEFT JOIN table1 t1
      ON t1.fname=t2.fname 
      AND t1.lname=t2.lname 
      AND t1.state=t2.state
      AND t1.network=t2.network
WHERE t1.network IS NULL

And the updated fiddle.

BTW -- these should both work in MSAccess as it supports UNION.

Good luck.

like image 143
sgeddes Avatar answered Oct 01 '22 22:10

sgeddes