What is the difference between Apache Sqoop and Hive? I know that sqoop is used to import/export data from RDBMS to HDFS and Hive is a SQL layer abstraction on top of Hadoop. Can I can use Sqoop for importing data into HDFS and then use Hive for querying?
Yes, you can. In fact many people use sqoop and hive for exactly what you have told.
In my project what I had to do was to load the historical data from my RDBMS which was oracle, move it to HDFS. I had hive external tables defined for this path. This allowed me to run hive queries to do transformations. Also, we used to write mapreduce programs on top of these data to come up with various analysis.
Sqoop transfers data between HDFS and relational databases. You can use Sqoop to transfer data from a relational database management system (RDBMS) such as MySQL or Oracle into HDFS and use MapReduce on the transferred data. Sqoop can export this transformed data back into an RDBMS as well. More info http://sqoop.apache.org/docs/1.4.3/index.html
Hive is a data warehouse software that facilitates querying and managing large datasets residing in HDFS. Hive provides schema on read (as opposed to schema on write for RDBMS) onto the data and the ability to query the data using a SQL-like language called HiveQL. More info https://hive.apache.org/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With