Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between GROUP and COGROUP in PIG?

I understood Group didn't work with multiple tuples and hence we had COGROUP in PIG. However, while checking today the GROUP command works for me. I am using PIG-0.12.0. My commands and outputs are as follows.

grunt> grpvar = GROUP C by $2, B by $2;
grunt> cogrpvar = COGROUP C by $2, B by $2;
grunt> describe grpvar;

grpvar: {group: chararray,C: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)},B: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)}}

grunt> describe cogrpvar;

cogrpvar: {group: chararray,C: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)},B: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)}}

Is GROUP expected to work like this? What is the difference between GROUP and COGROUP them?

like image 300
proutray Avatar asked Jul 30 '14 04:07

proutray


People also ask

Which data structure resembles the output of Cogroup operator in pig?

The resulting data structure is (common element, bag of tuples). CoGroup is the same concept as join except it does not create a new tuple.

What is flatten in pig?

As per Pig documentation: The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and result is different for each type of structure.

Which of the following types of joins Does Pig support?

Self-join is used to join a table with itself as if the table were two relations, temporarily renaming at least one relation. Generally, in Apache Pig, to perform self-join, we will load the same data multiple times, under different aliases (names). Therefore let us load the contents of the file customers.

What is the use of foreach generate operator in pig scripts?

The FOREACH operator is used to generate specified data transformations based on the column data.


1 Answers

Yes group is supposed to work like that !

According to the documentation ( http://pig.apache.org/docs/r0.12.0/basic.html#group ) :

Note: The GROUP and COGROUP operators are identical. Both operators work with one or more relations. For readability GROUP is used in statements involving one relation and COGROUP is used in statements involving two or more relations. You can COGROUP up to but no more than 127 relations at a time.

So it is just for readability, no differences between the two.

like image 165
Tibo R Avatar answered Sep 20 '22 13:09

Tibo R