I could not understand the difference between the following commands in sqoop. It would be better if someone could explain with small examples.
--warehouse-dir and --target-dir
Thanks
Sqoop has two main functions: importing and exporting. Importing transfers structured data into HDFS; exporting moves this data from Hadoop to external databases in the cloud or on-premises. Importing involves Sqoop assessing the external database's metadata before mapping it to Hadoop.
By default, imports go to a new target location. If the destination directory already exists in HDFS, Sqoop will refuse to import and overwrite that directory's contents.
Warehouse-dir creates the parent directory in which all your tables will be stored in the folders which are named after the table name. If you are importing table by table, each time you need to provide the distinctive target-directory location as target-directory location can't be same in each import.
We can specify the target directory while importing table data into HDFS using the Sqoop import tool. Following is the syntax to specify the target directory as option to the Sqoop import command. The following command is used to import emp_add table data into '/queryresult' directory.
As I got in case of import:
--warehouse-dir : It create a directory which works as database directory (sqoop_db_movies) and table name (as given in import command) directory automatically created with imported files with in warehouse dir(database directory).
Example: sqoop import --options-file /home/cloudera/sqoop/conn --table movies --warehouse-dir /sqoop_db_movies -m 1
Output as:
/sqoop_db_movies/movies
/sqoop_db_movies/movies/_SUCCESS
/sqoop_db_movies/movies/part-m-00000
--target-dir: It create a directory which work as table name (sqoop_table_movies) with imported files.
Example: sqoop import --options-file /home/cloudera/sqoop/conn --table movies --target-dir /sqoop_table_movies -m 1
Output as:
/sqoop_table_movies/_SUCCESS
/sqoop_table_movies/part-m-00000
Below parameter points to default hive table location.It can be used for dev purpose, where you just want to perform some tests on internal tables.
--warehouse-dir
Below parameter points to some hdfs location, where you can mount external hive tables.This is useful in production environment, where you want every data to be available to some external dir and external table.
--target-dir
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With