As per my understanding sqoop is used to import or export table/data from the Database to HDFS or Hive or HBASE. And we can directly import a single table or list of tables. Internally mapreduce program (i think only map task) will run. My doubt is what is sqoop direct and what when to go with sqoop direct option?

Just read the Sqoop documentation! <ul> <li>General principles are located here for imports and there for exports</li> </ul> <blockquote> Some databases can perform imports in a more high-performance fashion by using database-specific data movement tools (...) <hr> Some databases provides a direct mode for exports as well (...) <hr> Details about use of direct mode with each specific RDBMS, installation requirements, available options and limitations can be found in Section 25 </blockquote> <ul> <li>Section 25 under MySQL </li> <li>Section 25 under Oracle data connector for Hadoop </li> <li>etc.</li> </ul> Bottom line: "direct mode" means different things for different databases. For MySQL or PostgreSQL it relates to bulk loader/unloader utilities (i.e. completetely bypassing JDBC); while for Oracle it relates to "direct path INSERT" i.e. with JDBC but in a non-transactional mode (so you'd better use a temp table, or you might end up with duplicates in a PK and a corrupt table).

To be short and precise,its the mode for fast import which doesn't runs any mappers or reducers. <pre class="prettyprint"><code>sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES --direct </code></pre> Notes: <ol> <li> <code>--direct</code> is only supported in mysql and postgresql.</li> <li>Sqoop’s direct mode does not support imports of <code>BLOB</code>, <code>CLOB</code>, or <code>LONGVARBINARY</code> columns.</li> </ol>

What is --direct mode in sqoop?

2 Answers

Just read the Sqoop documentation!

General principles are located here for imports and there for exports

Some databases can perform imports in a more high-performance fashion by using database-specific data movement tools (...)

Some databases provides a direct mode for exports as well (...)
Details about use of direct mode with each specific RDBMS, installation requirements, available options and limitations can be found in Section 25

Section 25 under MySQL
Section 25 under Oracle data connector for Hadoop
etc.

Bottom line: "direct mode" means different things for different databases.
For MySQL or PostgreSQL it relates to bulk loader/unloader utilities (i.e. completetely bypassing JDBC); while for Oracle it relates to "direct path INSERT" i.e. with JDBC but in a non-transactional mode (so you'd better use a temp table, or you might end up with duplicates in a PK and a corrupt table).

answered Sep 21 '22 13:09

Samson Scharfrichter

To be short and precise,its the mode for fast import which doesn't runs any mappers or reducers.

sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES --direct

Notes:

--direct is only supported in mysql and postgresql.
Sqoop’s direct mode does not support imports of BLOB, CLOB, or LONGVARBINARY columns.

answered Sep 22 '22 13:09

Subash

Related questions
                            
                                Namenode failure and recovery in Hadoop
                            
                                Table Partitioned by Timestamp Field
                            
                                hadoop file system change directory command
                            
                                Hive, how do I retrieve all the database's tables columns
                            
                                Sqoop - Binding to YARN queues
                            
                                How to find if a folder exists in hadoop or not?
                            
                                Cannot create directory /home/hadoop/hadoopinfra/hdfs/namenode/current
                            
                                How to assign and use column headers in Spark?
                            
                                pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: user'
                            
                                When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment
                            
                                Debugging hadoop applications
                            
                                Suggestions needed for optimizing O(n^2) algorithm
                            
                                Apache Pig permissions issue
                            
                                In Hadoop where does the framework save the output of the Map task in a normal Map-Reduce Application?
                            
                                Name of Hive table is now a reserved keyword
                            
                                Where are the hadoop-examples* and hadoop-test* jars in Cloudera CDH?
                            
                                Junit External Resource @Rule Order
                            
                                How to run Hadoop on a Mesos cluster?
                            
                                java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
                            
                                Loading CSV file on Hive Table with String Array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is --direct mode in sqoop?

Tags:

hadoop

hadoop2

sqoop

sqoop2

Raj

People also ask

2 Answers

Samson Scharfrichter

Subash

Recent Activity

Donate For Us