Can anyone tell the difference between create-hive-table
& hive-import
method? Both will create a hive table, but still what is the significance of each?
Sqoop is a tool that enables you to bulk import and export data from a database. You can use Sqoop to import data into HDFS or directly into Hive. However, Sqoop can only import data into Hive as a text file or as a SequenceFile.
Apache Sqoop is designed to efficiently transfer enormous volumes of data between Apache Hadoop and structured datastores such as relational databases. It helps to offload certain tasks, such as ETL processing, from an enterprise data warehouse to Hadoop, for efficient execution at a much lower cost.
Sqoop does not support creating Hive external tables. Instead you might: Use the Sqoop codegen command to generate the SQL for creating the Hive internal table that matches your remote RDBMS table (see http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal)
Sqoop has two main functions: importing and exporting. Importing transfers structured data into HDFS; exporting moves this data from Hadoop to external databases in the cloud or on-premises. Importing involves Sqoop assessing the external database's metadata before mapping it to Hadoop.
hive-import command:hive-import
commands automatically populates the metadata for the populating tables in hive metastore. If the table in Hive does not exist yet, Sqoop
will simply create it based on the metadata fetched for your table or query. If the table already exists, Sqoop will import data into the existing table. If you’re creating a new Hive table, Sqoop will convert the data types of each column from your source table to a type compatible with Hive.
create-hive-table command:
Sqoop can generate a hive table (using create-hive-table
command) based on the table from an existing relational data source. If set, then the job will fail if the target hive table exists. By default this property is false.
Using create-hive-table
command involves three steps: importing data into HDFS, creating hive table and then loading the HDFS data into Hive. This can be shortened to one step by using hive-import
.
During a hive-import
, Sqoop will first do a normal HDFS import to a temporary location. After a successful import, Sqoop generates two queries: one for creating a table and another one for loading the data from a temporary location. You can specify any temporary location using either the --target-dir
or --warehouse-dir
parameter.
Added a example below for above description
Using create-hive-table command:
Involves three steps:
Importing data from RDBMS to HDFS
sqoop import --connect jdbc:mysql://localhost:3306/hadoopexample --table employees --split-by empid -m 1;
Creating hive table using create-hive-table
command
sqoop create-hive-table --connect jdbc:mysql://localhost:3306/hadoopexample --table employees --fields-terminated-by ',';
Loading data into Hive
hive> load data inpath "employees" into table employees;
Loading data to table default.employees
Table default.employees stats: [numFiles=1, totalSize=70]
OK
Time taken: 2.269 seconds
hive> select * from employees;
OK
1001 emp1 101
1002 emp2 102
1003 emp3 101
1004 emp4 101
1005 emp5 103
Time taken: 0.334 seconds, Fetched: 5 row(s)
Using hive-import command:
sqoop import --connect jdbc:mysql://localhost:3306/hadoopexample --table departments --split-by deptid -m 1 --hive-import;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With