Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

properly loading datetime in pig

I'm loading a tsv file with a datetime column and long column with:

A = LOAD 'tweets-clean.txt' USING PigStorage('\t') AS (date:datetime, userid:long);
DUMP A;

An example line of input:

Tue Feb 11 05:02:10 +0000 2014  205291417

that line of output:

, 205291417

How do I do this properly?

like image 457
rcj Avatar asked Feb 26 '14 20:02

rcj


1 Answers

You'd want to load date as a chararray (date:chararray) and then can convert it to to a datetime using FOREACH GENERATE along with the ToDate Pig built-in function.

The format string is based on the SimpleDateFormat

A = LOAD 'tweets-clean.txt' USING PigStorage('\t') AS (date:chararray, userid:long);
B = FOREACH A GENERATE ToDate(date, '<some format string>') AS date, userid;
DUMP B;
like image 107
Adam Shook Avatar answered Nov 05 '22 10:11

Adam Shook