I have a sample input as tab separated key, value pair as follows
B_1001@2012-06-15 [email protected]
B_1001@2012-06-18 [email protected]
B_1002@2012-09-26 [email protected]
B_1002@2012-09-28 [email protected]
and I am loading this file into pig and doing the following
a = load '/home/HadoopUser/Desktop/a.txt' as (key:chararray, value:chararray);
describe a;
a: {key: chararray,value: chararray}
b = foreach a generate key, flatten(STRSPLIT(value,'@',2)) as (v1:double,v2:float);
describe b;
b: {key: chararray,v1: double,v2: float}
c = group b by key;
describe c;
c: {group: chararray,b: {key: chararray,v1: double,v2: float}}
this works till here but when I use Arthematical calculations over b.v1 I am getting ClassCastException as java.lang.String can't be casted to java.lang.Double
but describe gives no error
d = foreach c generate group,SUM(b.v1);
describe d;
d: {group: chararray,double}
when I dump d; it id giving the exception
I even tried typecasting 'b' as well
b = foreach a generate key, (tuple (double,double))STRSPLIT(value,'@',2);
now when I describe b; Its giving an error as Cannot cast tuple with schema tuple to tuple with schema tuple({double,double})
Please help me to know why is it coming like this even describe shows correct schema.
I have experienced this issue before as well. I can't find the bug tracker link for it right now, but when you set the type/'cast' with a statement like B = FOREACH A GENERATE key AS key: chararray
it will not actually cast the type (but it will change the output of DESCRIBE
). You are right that you'll have to do an explicit cast, and the docs say that you can cast a chararray to a double. Try something like:
b1 = FOREACH b GENERATE key, (double)v1, (float)v2 ;
Update: Here is the link to the bug: https://issues.apache.org/jira/browse/PIG-2315
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With