I do outer joins on single columns in Pig like this
result = JOIN A by id LEFT OUTER, B by id;
How do I join on two columns, something like -
WHERE A.id=B.id AND A.name=B.name
What is the pig equivalent? I couldn't find any example in the pig manuals...any help?
Here is how you can perform a JOIN operation on two tables using multiple keys. grunt> Relation3_name = JOIN Relation2_name BY (key1, key2), Relation3_name BY (key1, key2);
Inner join is one of the most frequently used join. Inner join returns the common rows between the two tables based on the condition implied. Inner join is also called as equi join.
Now, you can use 'C' as the 'empty relation' that has one empty tuple. Show activity on this post. DEFINE GenerateRelationFromString(string) RETURNS relation { temp = LOAD 'somefile'; tempLimit1 = LIMIT temp 1; $relation = FOREACH tempLimit1 GENERATE FLATTEN(TOKENIZE('$string', ',')); };
As per Pig documentation: The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and result is different for each type of structure.
The above answer is actually an INNER join, the correct pig statement should be:
join a by (id, name) LEFT OUTER, b by (id, name)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With