I have this code in Pig (win, request and response are just tables loaded directly from filesystem):
win_request = JOIN win BY bid_id, request BY bid_id;
win_request_response = JOIN win_request BY win.bid_id, response BY bid_id;
win_group = GROUP win_request_response BY (win.campaign_id);
win_count = FOREACH win_group GENERATE group, SUM(win.bid_price);
Basically I want to sum the bid_price after joining and grouping, but I get an error:
Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
My guess is that I'm not referring correctly to win.bid_price
.
The GROUP operator is used to group the data in one or more relations. It collects the data having the same key.
Q46 What are the relational operators available related to combining and splitting in pig language? Answer: UNION and SPLIT used for combining and splitting relations in the pig.
Use the JOIN keyword to specify that the tables should be joined. Combine JOIN with other join-related keywords (e.g. INNER or OUTER ) to specify the type of join.
The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and result is different for each type of structure.
When performing multiple joins I recommend using unique identifiers for your fields (e.g. for bid_id). Alternatively, you can also use the disambiguation operator '::', but that can get pretty dirty.
wins = LOAD '/user/hadoop/rtb/wins' USING PigStorage(',') AS (f1_w:int, f2_w:int, f3_w:chararray);
reqs = LOAD '/user/hadoop/rtb/reqs' USING PigStorage(',') AS (f1_r:int, f2_r:int, f3_r:chararray);
resps = LOAD '/user/hadoop/rtb/resps' USING PigStorage(',') AS (f1_rp:int, f2_rp:int, f3_rp:chararray);
wins_reqs = JOIN wins BY f1_w, reqs BY f1_r;
wins_reqs_reps = JOIN wins_reqs BY f1_r, resps BY f1_rp;
win_group = GROUP wins_reqs_reps BY (f3_w);
win_sum = FOREACH win_group GENERATE group, SUM(wins_reqs_reps.f2_w);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With