Example: I have a relation "class", with a nested bag of students:
class: {teacher_name: chararray,students: {(firstname: chararray, lastname: chararray)}
I want to perform an operation on each student, while leaving the global structure untouched, ie, obtain:
class: {teacher_name: chararray,students: {(fullname: chararray)}
where for each student, fullname = CONCAT(firstname, lastname)
My understanding is that a nested FOREACH would not be my solution here, as it still only generates 1 record per input tuple, whereas I want something that would apply within each bag item.
Pretty easy to do with an UDF but wondered if it's possible to do it in pure Piglatin
The FOREACH operator of Apache pig is used to create unique function as per the column data which is available. FOREACH operator generates data transformations which is done based on columns of data. The FOREACH operator is used to generate specified data transformations which is done based on the column data.
FOREACH gives us a simple way to apply transformations which is done based on columns. The FOREACH operator of Apache pig is used to create unique function as per the column data which is available. FOREACH operator generates data transformations which is done based on columns of data.
Further, using the DUMP operator, verify the relation limit_data. By displaying the contents of the relation limit_data, it will produce the following output. This was all on Apache Pig Operators. As a result, we have seen all the Apache Pig Operators in detail, along with their Examples. However, if any query occurs, feel free to share.
Apache Pig - Bag & Tuple Functions S.N. Function & Description 1 TOBAG () To convert two or more expressi ... 2 TOP () To get the top N tuples of a rela ... 3 TOTUPLE () To convert one or more expres ... 4 TOMAP () To convert the key-value pairs ...
In PIG 0.10 it is possible without the UDF, as FOREACH can be nested in FOREACH. Here is an example:
inpt = load '~/pig/data/bag_concat.dat' as (k : chararray, c1 : chararray, c2 : chararray);
dump inpt;
1 q w
1 s d
2 q a
2 t y
2 u i
2 o p
bags = group inpt by k;
describe bags;
bags: {group: chararray,inpt: {(k: chararray,c1: chararray,c2: chararray)}}
result = foreach bags {
concat = foreach inpt generate CONCAT(c1, c2); --it will iterate only over the records of the inpt bag
generate group, concat;
};
dump result;
(1,{(qw),(sd)})
(2,{(qa),(ty),(ui),(op)})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With