Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot cast bytearray to chararray in pig

Tags:

apache-pig

I have a data as follows:

(000001, mfp=621|mdus=4.0|mduc=5.0|mas=1|mpc=4.0|mfn=1|country=ABC)
(00002, address=1000+mity|mus=1|name=kailtig+bksyt|mas=1|mpc=4.977552|country=ABC)

The fields are identifier and set of attributes.

I am trying to populate all the attributes in the data and do some operations on them.

So, I prepared my script as follows:

A = load 'myData.txt' using PigStorage(',') as (ID, ATTRIBUTES);
B = foreach A generate FLATTEN(STRSPLIT(ATTRIBUTES, '\\|')) ;
C = foreach B generate FLATTEN(TOBAG(*));
Dump C;

()
( mfp=621)
(mdus=4.0)
(mduc=5.0)
(mas=1)
(mpc=4.0)
(mfn=1)
(country=ABC))
( address=1000+mity)
(mus=1)
(name=kailtig+bksyt)
(mpc=4.977552)

Upto this point, it is working all right. But, the problem starts here.

When I try to do some operations on these attributes, for example replace 'm' by 'market'

D = foreach C generate REPLACE($0,'m','market');

gives me an error as follows:

 Could not infer the matching function for org.apache.pig.builtin.REPLACE as 
 multiple or none of them fit. Please use an explicit cast.

When I try to cast the bytearray to chararray

D = foreach C generate (chararray)$0;

gives me error as:

 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1052:
 <line 4, column 24> Cannot cast bytearray to chararray

But, the following link http://pig.apache.org/docs/r0.11.1/basic.html#cast says that piglatin supports cast from bytearray to chararray.

How can I solve this problem. Please help.

Thanks.

like image 875
bndg Avatar asked Oct 03 '22 18:10

bndg


1 Answers

I'm not sure if it is 100% necessary for you to use byte arrays, but if it isn't you can use:

A = LOAD 'myData.txt' USING PigStorage(',') AS (id, attrs) ; 
B = FOREACH A GENERATE FLATTEN(TOKENIZE(attrs, '|')) AS attr:chararray ;
-- Now that the data is loaded as chararrays REPLACE will work 
C = FOREACH B GENERATE REPLACE(attr,'m','market') AS attrchanged ;

So that when attrs is split and flattened it will additionally be converted to a chararray. In general, you probably want to declare the type ahead of time with the schema.

The schema and output from each step are as follows:

A: {id: bytearray,attrs: bytearray}
((000001, mfp=621|mdus=4.0|mduc=5.0|mas=1|mpc=4.0|mfn=1|country=ABC))
((00002, address=1000+mity|mus=1|name=kailtig+bksyt|mas=1|mpc=4.977552|country=ABC))
B: {attr: chararray}
( mfp=621)
(mdus=4.0)
(mduc=5.0)
(mas=1)
(mpc=4.0)
(mfn=1)
(country=ABC))
( address=1000+mity)
(mus=1)
(name=kailtig+bksyt)
(mas=1)
(mpc=4.977552)
(country=ABC))
C: {attrchanged: chararray}
( marketfp=621)
(marketdus=4.0)
(marketduc=5.0)
(marketas=1)
(marketpc=4.0)
(marketfn=1)
(country=ABC))
( address=1000+marketity)
(marketus=1)
(namarkete=kailtig+bksyt)
(marketas=1)
(marketpc=4.977552)
(country=ABC))
like image 167
mr2ert Avatar answered Oct 12 '22 11:10

mr2ert