Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pig: apply a FOREACH operator to each element within a bag

Tags:

apache-pig

Example: I have a relation "class", with a nested bag of students:

class: {teacher_name: chararray,students: {(firstname: chararray, lastname: chararray)}

I want to perform an operation on each student, while leaving the global structure untouched, ie, obtain:

class: {teacher_name: chararray,students: {(fullname: chararray)}

where for each student, fullname = CONCAT(firstname, lastname)

My understanding is that a nested FOREACH would not be my solution here, as it still only generates 1 record per input tuple, whereas I want something that would apply within each bag item.

Pretty easy to do with an UDF but wondered if it's possible to do it in pure Piglatin

like image 302
Zorglub Avatar asked Aug 24 '12 09:08

Zorglub


People also ask

What is the foreach operator in Apache Pig?

The FOREACH operator of Apache pig is used to create unique function as per the column data which is available. FOREACH operator generates data transformations which is done based on columns of data. The FOREACH operator is used to generate specified data transformations which is done based on the column data.

How to apply transformations based on Column data in Apache Pig?

FOREACH gives us a simple way to apply transformations which is done based on columns. The FOREACH operator of Apache pig is used to create unique function as per the column data which is available. FOREACH operator generates data transformations which is done based on columns of data.

How to verify the relation limit_data in Apache Pig operators?

Further, using the DUMP operator, verify the relation limit_data. By displaying the contents of the relation limit_data, it will produce the following output. This was all on Apache Pig Operators. As a result, we have seen all the Apache Pig Operators in detail, along with their Examples. However, if any query occurs, feel free to share.

What is the function of bag in Apache Pig?

Apache Pig - Bag & Tuple Functions S.N. Function & Description 1 TOBAG () To convert two or more expressi ... 2 TOP () To get the top N tuples of a rela ... 3 TOTUPLE () To convert one or more expres ... 4 TOMAP () To convert the key-value pairs ...


1 Answers

In PIG 0.10 it is possible without the UDF, as FOREACH can be nested in FOREACH. Here is an example:

inpt = load '~/pig/data/bag_concat.dat' as (k : chararray, c1 : chararray, c2 : chararray);
dump inpt;
1   q   w
1   s   d
2   q   a
2   t   y
2   u   i
2   o   p

bags = group inpt by k;
describe bags;

bags: {group: chararray,inpt: {(k: chararray,c1: chararray,c2: chararray)}}

result = foreach bags {
    concat = foreach inpt generate CONCAT(c1, c2); --it will iterate only over the records of the inpt bag
    generate group, concat;
};
dump result;

(1,{(qw),(sd)})
(2,{(qa),(ty),(ui),(op)})
like image 153
alexeipab Avatar answered Nov 20 '22 01:11

alexeipab