Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Refresh/Clear the DistributedCache When Using Hue + Beeswax To Run Hive Queries That Define Custom UDFs?

I've set up a Hadoop cluster (using the Cloudera distro through Cloudera Manager) and I'm running some Hive queries using the Hue interface, which uses Beeswax underneath.

All my queries run fine and I have even successfully deployed a custom UDF.

But, while deploying the UDF, I ran into a very frustrating versioning issue. In the initial version of my UDF class, I used a 3rd party class that was causing a StackOverflowError.

I fixed this error and then verified that the UDF can be deployed and used successfully from the hive command line.

Then, when I went back to using Hue and Beeswax again, I kept getting the same error. I could fix this only by changing my UDF java class name. (From Lower to Lower2).

Now, my question is, what is the proper way to deal with these kind of version issues?

From what I understand, when I add jars using the handy form fields to the left, they get added to the distributed cache. So, how do I refresh/clear the distributed cache? (I couldn't get LIST JARS; etc. to run from within Hive / Beeswax. It gives me a syntax error.)

like image 928
nemo Avatar asked Apr 27 '13 00:04

nemo


People also ask

How do I run a Hive query in Hue?

Select Hive from “Query Editor”. Click on refresh and you will see two sample tables. Now open the browser and click on Hue then select Hive from “Query Editors” and type CREATE DATABASE empdb; click on Execute then click on refresh then select empdb from the database list.


1 Answers

Since the classes are loaded onto the Beeswax Server JVM (same goes with HiveServer1 and HiveServer2 JVMs), deploying a new version of a jar could often require restarting these service to avoid such class loading issues.

like image 84
Harsh J Avatar answered Oct 18 '22 19:10

Harsh J