Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is --direct mode in sqoop?

As per my understanding sqoop is used to import or export table/data from the Database to HDFS or Hive or HBASE.

And we can directly import a single table or list of tables. Internally mapreduce program (i think only map task) will run.

My doubt is what is sqoop direct and what when to go with sqoop direct option?

like image 633
Raj Avatar asked Aug 25 '16 16:08

Raj


People also ask

What is direct mode in Sqoop?

By default, Sqoop uses JDBC to connect to the database. However, depending on the database, there may be a faster, database-specific connector available, which you can use by using the --direct option. So, you go with --direct option when you want to use a different database connector than the default.

What are the 2 main functions of Sqoop?

Sqoop has two main functions: importing and exporting. Importing transfers structured data into HDFS; exporting moves this data from Hadoop to external databases in the cloud or on-premises. Importing involves Sqoop assessing the external database's metadata before mapping it to Hadoop.

Why is the default maximum mappers are 4 in Sqoop?

when we don't mention the number of mappers while transferring the data from RDBMS to HDFS file system sqoop will use default number of mapper 4. Sqoop imports data in parallel from most database sources. Sqoop only uses mappers as it does parallel import and export.

What are the two different incremental modes of importing data into Sqoop?

Sqoop supports two types of incremental imports: append and lastmodified . You can use the --incremental argument to specify the type of incremental import to perform. You should specify append mode when importing a table where new rows are continually being added with increasing row id values.


2 Answers

Just read the Sqoop documentation!

  • General principles are located here for imports and there for exports

Some databases can perform imports in a more high-performance fashion by using database-specific data movement tools (...)


Some databases provides a direct mode for exports as well (...)

Details about use of direct mode with each specific RDBMS, installation requirements, available options and limitations can be found in Section 25
  • Section 25 under MySQL
  • Section 25 under Oracle data connector for Hadoop
  • etc.

Bottom line: "direct mode" means different things for different databases.
For MySQL or PostgreSQL it relates to bulk loader/unloader utilities (i.e. completetely bypassing JDBC); while for Oracle it relates to "direct path INSERT" i.e. with JDBC but in a non-transactional mode (so you'd better use a temp table, or you might end up with duplicates in a PK and a corrupt table).

like image 90
Samson Scharfrichter Avatar answered Sep 21 '22 13:09

Samson Scharfrichter


To be short and precise,its the mode for fast import which doesn't runs any mappers or reducers.

sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES --direct

Notes:

  1. --direct is only supported in mysql and postgresql.
  2. Sqoop’s direct mode does not support imports of BLOB, CLOB, or LONGVARBINARY columns.
like image 40
Subash Avatar answered Sep 22 '22 13:09

Subash