If one have data like those:
A = LOAD 'data' AS (a1:int,a2:int,a3:int);
DUMP A;
(1,2,3)
(4,2,1)
And then a cross-join is done on A, A:
B = CROSS A, A;
DUMP B;
(1,2,3)
(4,2,1)
Why is second A optimized out from the query?
info: pig version 0.11
== UPDATE ==
If I sort A like:
C = ORDER A BY a1;
D = CROSS A, C;
It will give a correct cross-join.
Here is how you can perform a JOIN operation on two tables using multiple keys. grunt> Relation3_name = JOIN Relation2_name BY (key1, key2), Relation3_name BY (key1, key2);
Now, you can use 'C' as the 'empty relation' that has one empty tuple. Show activity on this post. DEFINE GenerateRelationFromString(string) RETURNS relation { temp = LOAD 'somefile'; tempLimit1 = LIMIT temp 1; $relation = FOREACH tempLimit1 GENERATE FLATTEN(TOKENIZE('$string', ',')); };
There is no table concept in PIG. The closest to a table is a Relation. "A Pig relation is similar to a table in a relational database, where the tuples in the bag correspond to the rows in a table.
davek is correct -- you cannot CROSS
(or JOIN
) a relation with itself. If you wish to do this, you must create a copy of the data. In this case, you can use another LOAD
statement. If you want to do this with a relation further down a pipeline, you'll need to duplicate it using FOREACH
.
I have several macros that I use frequently and IMPORT
by default in all of my Pig scripts in case I need them. One is used for just this purpose:
DEFINE DUPLICATE(in) RETURNS out
{
$out = FOREACH $in GENERATE *;
};
This will work for you wherever in your pipeline you need a duplicate:
A1 = LOAD 'data' AS (a1:int,a2:int,a3:int);
A2 = DUPLICATE(A1);
B = CROSS A1, A2;
Note that even though A1
and A2
are identical, you cannot assume that the records are in the same order. But if you are doing a CROSS
or JOIN
, this probably doesn't matter.
I think you have to load the data twice to achieve what you want.
i.e.
A1 = LOAD 'data' AS (a1:int,a2:int,a3:int);
A2 = LOAD 'data' AS (a1:int,a2:int,a3:int);
B = CROSS A1, A2;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With