Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pig UDF for iso to yyyy-mm-dd hh:mm:ss.000

Iam looking to convert the ISO time format to yyyy-mm-dd hh:mm:ss.SSS. However Im not able achive the conversion. Iam new to pig and im trying to write a udf to handle the conversion from ISO format to yyyy-mm-dd hh:mm:ss.SSS.

Kindly guide me I tried the built functions of pig (FORMAT,DATE_FORMAT) however was not able to convert the data to the needed format.

Current data format: 2013-08-22T13:23:18.226220+01:00

Required Data format: 2013-08-22 13:23:18.226

import java.io.IOException;
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.EvalFunc;
import org.joda.time.DateTime;
import org.joda.time.format.*;
import org.joda.time.format.DateTimeFormatter;
import org.joda.time.format.DateTimeFormatterBuilder;
public class test extends EvalFunc<String>{

public String exec(Tuple input) throws IOException {

    if ((input == null) || (input.size() == 0))
        return null;
    try{
        String time = (String)input.get(0);
         DateFormat dt = new SimpleDateFormat ("yyyy-mm-dd hh:mm:ss.SSS");
         Date d_t = dt.parse(time);
         String timedt = getTimedt(d_t);
         return timedt; 
    } catch (ParseException e) {

        return null;
    }


}

private String getTimedt(Date d_t) {
     DateTimeFormatterBuilder formatter =  new DateTimeFormatterBuilder();   

    } 
}

How can i deal with the date conversions in pig?

like image 210
user2667326 Avatar asked Sep 06 '13 11:09

user2667326


2 Answers

With pig 0.11.1, a UDF is not required to convert from ISO 8601 format to yyyy-mm-dd hh:mm:ss.SSS format. Following is example code that shows how to convert a column of ISO 8601 format dates into yyyy-MM-dd HH:mm:ss.SSS dates.

converted_dates = FOREACH input_dates GENERATE ToString(date,'yyyy-MM-dd HH:mm:ss.SSS') as date:chararray;


NOTE:

I don't think the ToString function is documented... I guessed at this usage from this Google SOC proposal:

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/zjshen/21002

where the following function is mentioned as needing to be converted from a piggybank UDF into a built-in.

String ToString(DateTime d, String format)

My guess is that it was converted, but hasn't made its way into the main documentation yet. Here is the class documentation for the ToString built-in:

http://pig.apache.org/docs/r0.11.1/api/org/apache/pig/builtin/ToString.html

But we can see that the ToString function is missing from apache's pig documentation here:

http://pig.apache.org/docs/r0.11.1/func.html

like image 199
Freerobots Avatar answered Oct 01 '22 18:10

Freerobots


2013-08-22T13:23:18.226220+01:00 is XSD dateTime format and it should be parsed this way

XMLGregorianCalendar xc = DatatypeFactory.newInstance().newXMLGregorianCalendar("2013-08-22T13:23:18.226220+01:00");

from XMLGregorianCalendar you can get GregorianCalendar and then java.util.Date

GregorianCalendar gc = xc.toGregorianCalendar
Date date = gc.getTime();

Note that 226220 is fractional second. If you try to parse it with SimpleDateFormat as SSS it will parse it as 226220 milliseconds and it will be 226 secs 220 ms instead of 0.2226220 sec

like image 44
Evgeniy Dorofeev Avatar answered Oct 01 '22 19:10

Evgeniy Dorofeev