Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert "3" to 3 with PigLatin

I read in a csv-file that contains fields with numbers like that: "3". Can I convert this fields from "3" to 3 with PigLatin? I need it to use the SUM() - Function.

Thanks for your help!

like image 551
Christoph Avatar asked Dec 08 '10 16:12

Christoph


3 Answers

What about just removing the " with REPLACE?

For example:

data =
    LOAD 'data.txt' AS (num:CHARARRAY);

numbers =
    FOREACH data
    GENERATE
        (INT) REPLACE(num, '\\"', '');

Then you can GROUP and SUM.

One advantage is that you can cast the returned string directly to a number (no need to deal with bags). REGEX_EXTRACT could be used to do the same too.

like image 87
Romain Avatar answered Nov 18 '22 01:11

Romain


The TOKENIZE function will split a string on various characters considered to be word separators, one of which is a quote mark. So if you tokenize "3" and take the middle item, it should be just 3.

like image 26
Jacob Mattison Avatar answered Nov 18 '22 00:11

Jacob Mattison


You could write a UDF that strips the quotes around it OR use JacobM's approach.

However, afterwards, you should cast the chararray '3' to an int: (int)$1 or (int)myvalue. This way you can use sum.

http://pig.apache.org/docs/r0.5.0/piglatin_reference.html#Cast+Operators

like image 34
Donald Miner Avatar answered Nov 17 '22 23:11

Donald Miner