Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting an file exists error while import into Hive using sqoop

Tags:

import

hive

sqoop

I am trying to copy the retail_db database tables into hive database which I already created. When I execute the following code

sqoop import-all-tables \
--num-mappers 1 \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=retail_dba \
--password=cloudera \
--hive-import \
--hive-overwrite \
--create-hive-table \
--outdir java_files \
--hive-database retail_stage

My Map-reduce job stops with the following error:

ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists

I am trying to copy the tables to hive database,Then why an existing file in cloudera caused the problem. Is there a way to ignore this error or overwrite the existing file.

like image 801
Priyaranjan Avatar asked May 02 '26 04:05

Priyaranjan


2 Answers

This is how sqoop imports job works:

  • sqoop creates/imports data in tmp dir(HDFS) which is user's home dir(in your case it is /user/cloudera).

  • Then copy data to its actual hive location (i.e., /user/hive/wearhouse.

  • This categories dir should have exist before you ran import statements. so delete that dir or rename it if its important.

hadoop fs -rmr /user/cloudera/categories

OR

hadoop fs -mv /user/cloudera/categories /user/cloudera/categories_1

and re-run sqoop command!

So in short, Importing to Hive will use hdfs as the staging place and sqoop deletes staging dir /user/cloudera/categories after copying(sucessfully) to actual hdfs location - it is last stage of sqoop job to clean up staging/tmp files - so if you try to list the tmp staging dir, you won't find it.

After successful import: hadoop fs -ls /user/cloudera/categories - dir will not be there.

like image 110
Ronak Patel Avatar answered May 04 '26 19:05

Ronak Patel


Sqoop import to Hive works in 3 steps:

  • Put data to HDFS
  • Create Hive table if not exists
  • Load data into Hive Table

You have not mentioned --target-dir or --warehouse-dir, so it will put data in HDFS Home Directory which I believe /user/cloudera/ in your case.

Now for a MySQL table categories you might have imported it earlier. So, /user/cloudera/categories directory exists and you are getting this exception.

Add any non-existing directory in --taget-dir like --taget-dir /user/cloudera/mysqldata. Then sqoop will put all the Mysql Tables imported by above command in this location.

like image 21
Dev Avatar answered May 04 '26 21:05

Dev