Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to return Struct from Hive UDF?

I am having trouble finding documentation on how to use a Hive UDF to return a Struct.

My major questions are:

What types of objects do I start with in Java?

How do I convert them so they will be interpreted as a Struct in Hive?

like image 252
DJElbow Avatar asked Mar 18 '23 09:03

DJElbow


1 Answers

Here is a very simple example of such kind of UDF. It receives an User-Agent string, parse it using external lib and returns a structure with 4 text fields:

STRUCT<type: string, os: string, family: string, device: string>

You need to extend GenericUDF class and override two most important methods: initialize and evaluate.

initialize() describes the structure itself and defines data types inside.

evaluate() fills up the structure with actual values.

You don't need any special classes to return, struct<> in Hive is just an array of objects in Java.

import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.Text;

import eu.bitwalker.useragentutils.UserAgent;

public class UAStructUDF extends GenericUDF {

    private Object[] result;

    @Override
    public String getDisplayString(String[] arg0) {
        return "My display string";
    }

    @Override
    public ObjectInspector initialize(ObjectInspector[] arg0) throws UDFArgumentException {
        // Define the field names for the struct<> and their types
        ArrayList<String> structFieldNames = new ArrayList<String>();
        ArrayList<ObjectInspector> structFieldObjectInspectors = new ArrayList<ObjectInspector>();

        // fill struct field names
        // type
        structFieldNames.add("type");
        structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
        //family
        structFieldNames.add("family");
        structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
        // OS name
        structFieldNames.add("os");
        structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
        // device
        structFieldNames.add("device");
        structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);

        StructObjectInspector si = ObjectInspectorFactory.getStandardStructObjectInspector(structFieldNames,
                structFieldObjectInspectors);
        return si;
    }

    @Override
    public Object evaluate(DeferredObject[] args) throws HiveException {
        if (args == null || args.length < 1) {
            throw new HiveException("args is empty");
        }
        if (args[0].get() == null) {
            throw new HiveException("args contains null instead of object");
        }

        Object argObj = args[0].get();

        // get argument
        String argument = null;     
        if (argObj instanceof Text){
            argument = ((Text) argObj).toString();
        } else if (argObj instanceof String){
            argument = (String) argObj;
        } else {
            throw new HiveException("Argument is neither a Text nor String, it is a " + argObj.getClass().getCanonicalName());
        }
        // parse UA string and return struct, which is just an array of objects: Object[] 
        return parseUAString(argument);
    }

    private Object parseUAString(String argument) {
        result = new Object[4];
        UserAgent ua = new UserAgent(argument);
        result[0] = new Text(ua.getBrowser().getBrowserType().getName());
        result[1] = new Text(ua.getBrowser().getGroup().getName());
        result[2] = new Text(ua.getOperatingSystem().getName());
        result[3] = new Text(ua.getOperatingSystem().getDeviceType().getName());
        return result;
    }
}
like image 140
Dennis Tsvetkov Avatar answered Apr 02 '23 15:04

Dennis Tsvetkov