I'm looking for a clear, basic explanation of the concept of theta join in relational algebra and perhaps an example (using SQL perhaps) to illustrate its usage.
If I understand it correctly, the theta join is a natural join with a condition added in. So, whereas the natural join enforces equality between attributes of the same name (and removes the duplicate?), the theta join does the same thing but adds in a condition. Do I have this right? Any clear explanation, in simple terms (for a non-mathmetician) would be greatly appreciated.
Also (sorry to just throw this in at the end, but its sort of related), could someone explain the importance or idea of cartesian product? I think I'm missing something with regard to the basic concept, because to me it just seems like a restating of a basic fact, i.e that a set of 13 X a set of 4 = 52...
Theta (θ) Join Theta join combines tuples from different relations provided they satisfy the theta condition. The join condition is denoted by the symbol θ.
Theta Join allows you to merge two tables based on the condition represented by theta. When a theta join uses only equivalence condition, it becomes an equi join. Natural join does not utilize any of the comparison operators. An outer join doesn't require each record in the two join tables to have a matching record.
A theta join allows for arbitrary comparison relationships (such as ≥). An equijoin is a theta join using the equality operator. A natural join is an equijoin on attributes that have the same name in each relationship.
Because a relation has no attributes with duplicate names by definition, relational operations theta join and natural join will both "remove the duplicate attributes." [A big problem with posting examples in SQL to explain relation operations, as you requested, is that the result of a SQL query is not a relation ...
Leaving SQL aside for a moment...
A relational operator takes one or more relations as parameters and results in a relation. Because a relation has no attributes with duplicate names by definition, relational operations theta join and natural join will both "remove the duplicate attributes." [A big problem with posting examples in SQL to explain relation operations, as you requested, is that the result of a SQL query is not a relation because, among other sins, it can have duplicate rows and/or columns.]
The relational Cartesian product operation (results in a relation) differs from set Cartesian product (results in a set of pairs). The word 'Cartesian' isn't particularly helpful here. In fact, Codd called his primitive operator 'product'.
The truly relational language Tutorial D lacks a product operator and product is not a primitive operator in the relational algebra proposed by co-author of Tutorial D, Hugh Darwen**. This is because the natural join of two relations with no attribute names in common results in the same relation as the product of the same two relations i.e. natural join is more general and therefore more useful.
Consider these examples (Tutorial D):
WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
RELATION { TUPLE { X 1 } , TUPLE { X 2 } } AS R2 :
R1 JOIN R2
returns the product of the relations i.e. degree of two (i.e. two attributes, X
and Y
) and cardinality of 6 (2 x 3 = 6 tuples).
However,
WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } } AS R2 :
R1 JOIN R2
returns the natural join of the relations i.e. degree of one (i.e. the set union of the attributes yielding one attribute Y
) and cardinality of 2 (i.e. duplicate tuples removed).
I hope the above examples explain why your statement "that a set of 13 X a set of 4 = 52" is not strictly correct.
Similarly, Tutorial D does not include a theta join operator. This is essentially because other operators (e.g. natural join and restriction) make it both unnecessary and not terribly useful. In contrast, Codd's primitive operators included product and restriction which can be used to perform a theta join.
SQL has an explicit product operator named CROSS JOIN
which forces the result to be the product even if it entails violating 1NF by creating duplicate columns (attributes). Consider the SQL equivalent to the latter Tutoral D exmaple above:
WITH R1 AS (SELECT * FROM (VALUES (1), (2), (3)) AS T (Y)),
R2 AS (SELECT * FROM (VALUES (1), (2)) AS T (Y))
SELECT *
FROM R1 CROSS JOIN R2;
This returns a table expression with two columns (rather than one attribute) both called Y
(!!) and 6 rows i.e. this
SELECT c1 AS Y, c2 AS Y
FROM (VALUES (1, 1),
(2, 1),
(3, 1),
(1, 2),
(2, 2),
(3, 2)
) AS T (c1, c2);
** That is, although there is only one relational model (i.e. Codd's), there can be more than one relational algebra (i.e. Codd's is but one).
You're not quite right - a theta join is a join which may include a condition other than =
- in SQL, typically <
or >=
etc. See TechNet
As for cartesian product (or CROSS JOIN
), it is an operation rather than an idea or concept. It's important because sometimes you need to use it! It is a basic fact that set of 13 x set of 4 = 52
, and cartesian product is based on this fact.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With