I have a large mysql table that I would like to transfer to a Hadoop/Hive table. Are there standard commands or techniques to transfer a simple (but large) table from Mysql to Hive? The table stores mostly analytics data.
First of all download mysql-connector-java-5.0.8 and put the jar to lib and bin folder of Sqoop
Create the table definition in Hive with exact field names and types as in mysql
sqoop import --verbose --fields-terminated-by ',' --connect jdbc:mysql://localhost/test --table employee --hive-import --warehouse-dir /user/hive/warehouse --fields-terminated-by ',' --split-by id --hive-table employee
test - Database name
employee - Table name (present in test)
/user/hive/warehouse - Directory in HDFS where the data has to be imported
--split-by id - id can be the primary key of the table 'employee'
--hive-table employee - employee table whose definition is present in Hive
Sqoop User Guide (One of the best guide for learning Sqoop)
Apache Sqoop is a tool that solves this problem:
Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With