How can we automate the incremental import in SQoop ?
In incremental import, we need to give the --last-value
to start the import from the last value onwards, but my job is to frequently import from RDBMS, I don't want to give last value manually, is there any way we can automate this process?
An alternate approach to @Durga Viswanath Gadiraju answer.
In case you are importing the data to a hive table , you could query the last updated value from the hive table and pass the value to the sqoop import query. You could use shell script or oozie actions for achieving this.
Shell script :
lastupdatedvalue=`hive -e 'select last_value from table` #tweak the selection query based on the logic.
sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${lastupdatedvalue}
Oozie approach :
PFB a sudo workflow :
<workflow-app name="sqoop-to-hive" xmlns="uri:oozie:workflow:0.4">
<start to="hiveact"/>
<action name="hiveact">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<script>script.sql</script>
<capture-output/>
</hive>
<ok to="sqoopact"/>
<error to="kill"/>
<action name="sqoopact">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${wf:actionData('hiveact')}</command>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed</message>
</kill>
<end name="end"/>
Hope this helps.
You can take advantage of the built-in Sqoop metastore
You can create a simple incremental import job with the following command:
sqoop job \ --create <> \ --\ import \ --connect <> \ --username <> \ --password <> \ --table <> \ --incremental append \ --check-<> \ --last-value 0
And start it with the --exec parameter:
sqoop job --exec <<Job Name>>
Sqoop will automatically serialize the last imported value back into the metastore after each successful incremental job
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With