Can Some one explain me on getting this below output in Pigscript
my input file is below
a.txt
aaa.kyl,data,data
bbb.kkk,data,data
cccccc.hj,data,data
qa.dff,data,data
I am writing the pig script like this
A = LOAD 'a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(a1)),a2,a3;
I dont know how to proceed with this.. i need out put like this below.Basically i need all chars after the dot symbol in the first atom
(kyl,data,data)
(kkk,data,data)
(hj,data,data)
(dff,data,data)
Can some one give me the code for this
You can try with STRSPLIT() by following,
A = LOAD 'C:\\Users\\Ren\\Desktop\\file' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);
B = foreach A generate SUBSTRING(a1,INDEXOF(a1,'.',0)+1,(int)SIZE(a1)),a2,a3;
Here is what you need to do -
Here is an escaping problem in the pig parsing routines when it encounters the dot as its considered as an operator refer this link for more information Dot Operator.
You can use a unicode escape sequence for a dot instead: \u002E. However this must also be slash escaped and put in a single quoted string.
The below code will do the work for you and you can fine tune it as per your convenience -
A = LOAD 'a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(a1,'\\u002E')) as (a1:chararray, a1of1:chararray),a2,a3;
C = FOREACH B GENERATE a1of1,a2,a3;
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With