Clear explanation of the "theta join" in relational algebra?

Tags:

I'm looking for a clear, basic explanation of the concept of theta join in relational algebra and perhaps an example (using SQL perhaps) to illustrate its usage.

If I understand it correctly, the theta join is a natural join with a condition added in. So, whereas the natural join enforces equality between attributes of the same name (and removes the duplicate?), the theta join does the same thing but adds in a condition. Do I have this right? Any clear explanation, in simple terms (for a non-mathmetician) would be greatly appreciated.

Also (sorry to just throw this in at the end, but its sort of related), could someone explain the importance or idea of cartesian product? I think I'm missing something with regard to the basic concept, because to me it just seems like a restating of a basic fact, i.e that a set of 13 X a set of 4 = 52...

772

asked Sep 27 '11 01:09

LuxuryMode

2 Answers

Leaving SQL aside for a moment...

A relational operator takes one or more relations as parameters and results in a relation. Because a relation has no attributes with duplicate names by definition, relational operations theta join and natural join will both "remove the duplicate attributes." [A big problem with posting examples in SQL to explain relation operations, as you requested, is that the result of a SQL query is not a relation because, among other sins, it can have duplicate rows and/or columns.]

The relational Cartesian product operation (results in a relation) differs from set Cartesian product (results in a set of pairs). The word 'Cartesian' isn't particularly helpful here. In fact, Codd called his primitive operator 'product'.

The truly relational language Tutorial D lacks a product operator and product is not a primitive operator in the relational algebra proposed by co-author of Tutorial D, Hugh Darwen**. This is because the natural join of two relations with no attribute names in common results in the same relation as the product of the same two relations i.e. natural join is more general and therefore more useful.

Consider these examples (Tutorial D):

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { X 1 } , TUPLE { X 2 } } AS R2 :
R1 JOIN R2

returns the product of the relations i.e. degree of two (i.e. two attributes, X and Y) and cardinality of 6 (2 x 3 = 6 tuples).

However,

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } } AS R2 :
R1 JOIN R2

returns the natural join of the relations i.e. degree of one (i.e. the set union of the attributes yielding one attribute Y) and cardinality of 2 (i.e. duplicate tuples removed).

I hope the above examples explain why your statement "that a set of 13 X a set of 4 = 52" is not strictly correct.

Similarly, Tutorial D does not include a theta join operator. This is essentially because other operators (e.g. natural join and restriction) make it both unnecessary and not terribly useful. In contrast, Codd's primitive operators included product and restriction which can be used to perform a theta join.

SQL has an explicit product operator named CROSS JOIN which forces the result to be the product even if it entails violating 1NF by creating duplicate columns (attributes). Consider the SQL equivalent to the latter Tutoral D exmaple above:

WITH R1 AS (SELECT * FROM (VALUES (1), (2), (3)) AS T (Y)), 
     R2 AS (SELECT * FROM (VALUES (1), (2)) AS T (Y))
SELECT * 
  FROM R1 CROSS JOIN R2;

This returns a table expression with two columns (rather than one attribute) both called Y (!!) and 6 rows i.e. this

SELECT c1 AS Y, c2 AS Y 
  FROM (VALUES (1, 1), 
               (2, 1), 
               (3, 1), 
               (1, 2), 
               (2, 2), 
               (3, 2)
       ) AS T (c1, c2);

** That is, although there is only one relational model (i.e. Codd's), there can be more than one relational algebra (i.e. Codd's is but one).

121

answered Sep 30 '22 07:09

onedaywhen

You're not quite right - a theta join is a join which may include a condition other than = - in SQL, typically < or >= etc. See TechNet

As for cartesian product (or CROSS JOIN), it is an operation rather than an idea or concept. It's important because sometimes you need to use it! It is a basic fact that set of 13 x set of 4 = 52, and cartesian product is based on this fact.

answered Sep 30 '22 07:09

Kirk Broadhurst

Related questions
                            
                                SQL Divide by Two Count()
                            
                                What is the internal representation of datetime in sql server?
                            
                                How can I optimize/refactor a TSQL "LIKE" clause?
                            
                                How to order by maximum of two column which can be null in MySQL?
                            
                                Timeout exception causes SqlDataReader to close?
                            
                                SQL Server Geography datatype nearest point on line
                            
                                can we insert into two tables with single sql statement?
                            
                                MySQL: Unique constraint on multiple fields [duplicate]
                            
                                SQL Join to only the maximum row puzzle
                            
                                What's the execute order of the different parts of a SQL select statement?
                            
                                mysql: searching BETWEEN dates stored as varchar
                            
                                Select DISTINCT, return entire row
                            
                                return count 0 with mysql group by
                            
                                Copy one column to another for over a billion rows in SQL Server database
                            
                                SQL statement joining Oracle and MS SQL Server
                            
                                SQLite and inserting the current date in UTC format
                            
                                SQL - Filtering large tables with joins - best practices
                            
                                When does COUNT(*) return NULL?
                            
                                How can I check for duplicates before inserting into a table when inserting by select
                            
                                mySQL select differences between two tables in different databases

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Clear explanation of the "theta join" in relational algebra?

Tags:

sql

relational-database

cartesian-product

relational-algebra

LuxuryMode

People also ask

2 Answers

onedaywhen

Kirk Broadhurst

Recent Activity

Donate For Us