I read in a csv-file that contains fields with numbers like that: "3". Can I convert this fields from "3" to 3 with PigLatin? I need it to use the SUM() - Function.
Thanks for your help!
What about just removing the "
with REPLACE?
For example:
data =
LOAD 'data.txt' AS (num:CHARARRAY);
numbers =
FOREACH data
GENERATE
(INT) REPLACE(num, '\\"', '');
Then you can GROUP
and SUM
.
One advantage is that you can cast the returned string directly to a number (no need to deal with bags). REGEX_EXTRACT could be used to do the same too.
The TOKENIZE
function will split a string on various characters considered to be word separators, one of which is a quote mark. So if you tokenize "3" and take the middle item, it should be just 3.
You could write a UDF that strips the quotes around it OR use JacobM's approach.
However, afterwards, you should cast the chararray '3'
to an int
: (int)$1
or (int)myvalue
. This way you can use sum
.
http://pig.apache.org/docs/r0.5.0/piglatin_reference.html#Cast+Operators
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With