Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences between Apache Sqoop and Hive. Can we use both together?

Tags:

hadoop

hive

sqoop

What is the difference between Apache Sqoop and Hive? I know that sqoop is used to import/export data from RDBMS to HDFS and Hive is a SQL layer abstraction on top of Hadoop. Can I can use Sqoop for importing data into HDFS and then use Hive for querying?

like image 218
Raghavendra Kumar Avatar asked Oct 18 '25 03:10

Raghavendra Kumar


2 Answers

Yes, you can. In fact many people use sqoop and hive for exactly what you have told.

In my project what I had to do was to load the historical data from my RDBMS which was oracle, move it to HDFS. I had hive external tables defined for this path. This allowed me to run hive queries to do transformations. Also, we used to write mapreduce programs on top of these data to come up with various analysis.

like image 111
vishnu viswanath Avatar answered Oct 20 '25 12:10

vishnu viswanath


Sqoop transfers data between HDFS and relational databases. You can use Sqoop to transfer data from a relational database management system (RDBMS) such as MySQL or Oracle into HDFS and use MapReduce on the transferred data. Sqoop can export this transformed data back into an RDBMS as well. More info http://sqoop.apache.org/docs/1.4.3/index.html

Hive is a data warehouse software that facilitates querying and managing large datasets residing in HDFS. Hive provides schema on read (as opposed to schema on write for RDBMS) onto the data and the ability to query the data using a SQL-like language called HiveQL. More info https://hive.apache.org/

like image 26
Chris Marotta Avatar answered Oct 20 '25 12:10

Chris Marotta