Can anyone tell me what is the use of --split-by and boundary query in sqoop? <blockquote> sqoop import --connect jdbc:mysql://localhost/my --username user --password 1234 --query 'select * from table where id=5 AND $CONDITIONS' --split-by table.id --target-dir /dir </blockquote>

Split by : <ol> <li>why it is used? -> to enhance the speed while fetching the data from rdbms to hadoop</li> <li>How it works? -> By default there are 4 mappers in sqoop , so the import works parallely. The entire data is divided into equal partitions. Sqoop considers primary key column for splitting the data and then finds out the maximum and minimum range from it and then makes the 4 ranges for 4 mappers to work. Eg. 1000 records in primary key column and max value =1000 and min value -0 so sqoop will create 4 ranges - (0-250) , (250-500),(500-750),(750-1000) and depending on values of column the data will be partitioned and given to 4 mappers to store it on HDFS. so if in case the primary key column is not evenly distributed so with split-by you can change the column-name for evenly partitioning.</li> </ol> In short: Used for partitioning of data to support parallelism and improve performance

what are the following commands in sqoop?

1 Answers

Split by :

why it is used? -> to enhance the speed while fetching the data from rdbms to hadoop
How it works? -> By default there are 4 mappers in sqoop , so the import works parallely. The entire data is divided into equal partitions. Sqoop considers primary key column for splitting the data and then finds out the maximum and minimum range from it and then makes the 4 ranges for 4 mappers to work. Eg. 1000 records in primary key column and max value =1000 and min value -0 so sqoop will create 4 ranges - (0-250) , (250-500),(500-750),(750-1000) and depending on values of column the data will be partitioned and given to 4 mappers to store it on HDFS. so if in case the primary key column is not evenly distributed so with split-by you can change the column-name for evenly partitioning.

In short: Used for partitioning of data to support parallelism and improve performance

117

answered Sep 20 '22 20:09

Tutu Kumari

Related questions
                            
                                Is it possible to read MongoDB data, process it with Hadoop, and output it into a RDBS (MySQL)?
                            
                                Sqoop Hive exited with status 1
                            
                                Sqoop - Binding to YARN queues
                            
                                What is --direct mode in sqoop?
                            
                                Sqoop Hive table import, Table dataType doesn't match with database
                            
                                overwrite hdfs directory Sqoop import
                            
                                Sqoop - Could not find or load main class org.apache.sqoop.Sqoop
                            
                                sqoop import multiple tables
                            
                                Sqoop: Importing from SQL Server throwing "The TCP/IP connection to the host x.x.x.x, port 1433 has failed" during map tasks
                            
                                Where is the sqoop library directory?
                            
                                How do I access HBase table in Hive & vice-versa?
                            
                                Sqoop - Import Job failed
                            
                                SQOOP SQLSERVER Failed to load driver " appropriate connection manager is not being set"
                            
                                Can Sqoop export create a new table?
                            
                                How to use sqoop to export the default hive delimited output?
                            
                                Sqoop import without primary key in RDBMS
                            
                                Sqoop Incremental Import
                            
                                Difference between --warehouse-dir and --target-dir commands in sqoop
                            
                                Import data from HDFS to HBase (cdh3u2)
                            
                                Apache Spark-SQL vs Sqoop benchmarking while transferring data from RDBMS to hdfs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

what are the following commands in sqoop?

Tags:

sqoop

NJ_315

People also ask

1 Answers

Tutu Kumari

Recent Activity

Donate For Us