Is it possible to overwrite HDFS directory automatically instead of overwriting it every time manually while Sqoop import? (Do we have any option like "--overwrite" like we have for hive import "--hive-overwrite")
By default, sqoop will fail such executions as sqoop doesn't allow overwriting an existing directory in Hadoop. But there's a way to overcome this and replace the existing data. For this, we can use sqoop directive –delete-target-dir with target-dir parameter.
By default, data will be appended into the table, if the table already exists. We can overwrite the data by specifying --hive-overwrite while performing sqoop import using --hive-import . We can run the previous command again and validate.
By default, imports go to a new target location. If the destination directory already exists in HDFS, Sqoop will refuse to import and overwrite that directory's contents.
Use --delete-target-dir
It will delete <HDFS-target-dir>
provided in command before writing data to this directory.
Use this: --delete-target-dir
This will work for overwriting the hdfs directory using sqoop syntax:
$ sqoop import --connect jdbc:mysql://localhost/dbname --username username -P --table tablename --delete-target-dir --target-dir '/targetdirectorypath' -m 1
E.g:
$ sqoop import --connect jdbc:mysql://localhost/abc --username root -P --table empsqooptargetdel --delete-target-dir --target-dir '/tmp/sqooptargetdirdelete' -m 1
This command will refresh the corresponding hdfs directory or hive table data with updated/fresh data, every time this command is run.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With