Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive UDF with Python

I'm new to python, pandas, and hive and would definitely appreciate some tips.

I have the python code below, which I would like to turn into a UDF in hive. Only instead of taking a csv as the input, doing the transformations and then exporting another csv, I would like to take a hive table as the input, and then export the results as a new hive table containing the transformed data.

Python Code:

import pandas as pd
data = pd.read_csv('Input.csv')
df = data
df = df.set_index(['Field1','Field2'])
Dummies=pd.get_dummies(df['Field3']).reset_index()
df2=Dummies.drop_duplicates()
df3=df2.groupby(['Field1','Field2']).sum()
df3.to_csv('Output.csv')
like image 754
user3476463 Avatar asked Oct 19 '25 02:10

user3476463


1 Answers

You can make use of the TRANSFORM function to make use of a UDF written in Python. The detailed steps are outlined here and here.

like image 174
visakh Avatar answered Oct 21 '25 17:10

visakh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!