Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

StrSplit in Pig functions

Tags:

apache-pig

Can Some one explain me on getting this below output in Pigscript

my input file is below

a.txt

aaa.kyl,data,data
bbb.kkk,data,data
cccccc.hj,data,data
qa.dff,data,data

I am writing the pig script like this

A = LOAD 'a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(a1)),a2,a3;

I dont know how to proceed with this.. i need out put like this below.Basically i need all chars after the dot symbol in the first atom

(kyl,data,data)
(kkk,data,data)
(hj,data,data)
(dff,data,data)

Can some one give me the code for this

like image 421
Surender Raja Avatar asked Jul 27 '14 13:07

Surender Raja


2 Answers

You can try with STRSPLIT() by following,

A = LOAD 'C:\\Users\\Ren\\Desktop\\file' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray); 

B = foreach A generate SUBSTRING(a1,INDEXOF(a1,'.',0)+1,(int)SIZE(a1)),a2,a3;                                                                                 
like image 37
Rengasamy Avatar answered Sep 21 '22 23:09

Rengasamy


Here is what you need to do -

Here is an escaping problem in the pig parsing routines when it encounters the dot as its considered as an operator refer this link for more information Dot Operator.

You can use a unicode escape sequence for a dot instead: \u002E. However this must also be slash escaped and put in a single quoted string.

The below code will do the work for you and you can fine tune it as per your convenience -

A = LOAD 'a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(a1,'\\u002E')) as (a1:chararray, a1of1:chararray),a2,a3;
C = FOREACH B GENERATE a1of1,a2,a3;

Hope this helps.

like image 75
Rajnish G Avatar answered Sep 22 '22 23:09

Rajnish G