Can anyone tell me what is the use of --split-by and boundary query in sqoop?
sqoop import --connect jdbc:mysql://localhost/my --username user --password 1234 --query 'select * from table where id=5 AND $CONDITIONS' --split-by table.id --target-dir /dir
Verify Job (--list) '--list' argument is used to verify the saved jobs. The following command is used to verify the list of saved Sqoop jobs. It shows the list of saved jobs.
Execute Job (--exec) '--exec' option is used to execute a saved job. The following command is used to execute a saved job called myjob.
Sqoop has two main functions: importing and exporting. Importing transfers structured data into HDFS; exporting moves this data from Hadoop to external databases in the cloud or on-premises. Importing involves Sqoop assessing the external database's metadata before mapping it to Hadoop.
It can be used to enhance the import performance by achieving greater parallelism. Sqoop creates splits based on values in a particular column of the table which is specified by --split-by by the user through the import command. If it is not available, the primary key of the input table is used to create the splits.
Split by :
In short: Used for partitioning of data to support parallelism and improve performance
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With