Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can we automate incremental import in SQOOP?

How can we automate the incremental import in SQoop ?

In incremental import, we need to give the --last-value to start the import from the last value onwards, but my job is to frequently import from RDBMS, I don't want to give last value manually, is there any way we can automate this process?

like image 392
Devender Prakash Avatar asked Dec 24 '22 10:12

Devender Prakash


2 Answers

An alternate approach to @Durga Viswanath Gadiraju answer.

In case you are importing the data to a hive table , you could query the last updated value from the hive table and pass the value to the sqoop import query. You could use shell script or oozie actions for achieving this.

Shell script :

lastupdatedvalue=`hive -e 'select last_value from table` #tweak the selection query based on the logic.

sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${lastupdatedvalue}

Oozie approach :

  1. Hive action for the select query based on the logic to retrieve the last updated value .
  2. Sqoop action for incremental load from the captured output of previous hive action.

PFB a sudo workflow :

<workflow-app name="sqoop-to-hive" xmlns="uri:oozie:workflow:0.4">
<start to="hiveact"/>
<action name="hiveact">
    <hive xmlns="uri:oozie:hive-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
        </configuration>
        <script>script.sql</script>
<capture-output/>
    </hive>    
    <ok to="sqoopact"/>
    <error to="kill"/>

<action name="sqoopact">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <command>import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${wf:actionData('hiveact')}</command>
     </sqoop>
    <ok to="end"/>
    <error to="kill"/>
</action>
<kill name="kill">
    <message>Action failed</message>
</kill>
<end name="end"/>

Hope this helps.

like image 163
K S Nidhin Avatar answered Jan 04 '23 21:01

K S Nidhin


You can take advantage of the built-in Sqoop metastore

You can create a simple incremental import job with the following command:

sqoop job \ --create <> \ --\ import \ --connect <> \ --username <> \ --password <> \ --table <> \ --incremental append \ --check-<> \ --last-value 0

And start it with the --exec parameter:

sqoop job --exec <<Job Name>>

Sqoop will automatically serialize the last imported value back into the metastore after each successful incremental job

like image 35
Vamshi Avatar answered Jan 04 '23 19:01

Vamshi