I'd like to know what is the code generated by the compiler of SQL Hive ( i.e. if I execute one sql sentence I'd like to see the code of MapReduce jobs generated by compiler of SQL hive).
How can I get it?
At the highest level, there are four independent entities: • The client, which submits the MapReduce job. The jobtracker, which coordinates the job run. The jobtracker is a Java application whose main class is JobTracker. The tasktrackers, which run the tasks that the job has been split into.
Hive is a data warehousing framework that runs on top of Hadoop and provides an SQL abstraction for MapReduce apps. Data analysts and business intelligence officers need not learn another complex programming language for writing MapReduce apps.
Hive transforms HiveQL queries into MapReduce or Tez jobs that run on Apache Hadoop's distributed job scheduling framework, Yet Another Resource Negotiator (YARN). It queries data stored in a distributed storage solution, like the Hadoop Distributed File System (HDFS) or Amazon S3.
Hive uses a query language called HiveQL, which is similar to SQL. As seen from the image below, the user first sends out the Hive queries. These queries are converted into MapReduce tasks, and that accesses the Hadoop MapReduce system.
For Hive, it serializes the physical plan into an xml file (page 15 in http://www.slideshare.net/nzhang/hive-anatomy). So, I do not think that users can get the real source codes for hadoop. To get the code, you can try YSmart (http://ysmart.cse.ohio-state.edu/). It is a translator that will translate your sql queries to the java source code for hadoop. You can use the online version of the YSmart. Just submit the schema and your query, you will be able to view and download the java code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With