Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sqoop import without primary key in RDBMS

Tags:

import

sqoop

Can I import RDBMS table data (table doesn't have a primary key) to hive using sqoop? If yes, then can you please give the sqoop import command.

I have tried with sqoop import general command, but it failed.

like image 333
KM Prak Avatar asked Mar 19 '15 06:03

KM Prak


2 Answers

If your table has no primary key defined then you have to give -m 1 option for importing the data or you have to provide --split-by argument with some column name, otherwise it gives the error:  

ERROR tool.ImportTool: Error during import: No primary key could be found for table <table_name>. Please specify one with --split-by or perform a sequential import with '-m 1'

then your sqoop command will look like

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table user \
    --target-dir /user/root/user_data \
    --columns "first_name, last_name, created_date"
    -m 1

or

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table user \
    --target-dir /user/root/user_data \
    --columns "first_name, last_name, created_date"
    --split-by created_date
like image 179
Prasad Khode Avatar answered Sep 20 '22 00:09

Prasad Khode


Quick view:

The Sqoop job fails and the error looks like this" Error during import: No primary key could be found for the table . Please specify one with --split-by or perform a sequential import with '-m 1' "

Description: Usually, when you perform a Sqoop job internally it searches for the primary key in the table. If there is no primary key the Sqoop job fails and the error looks like this" Error during import: No primary key could be found for the table . Please specify one with --split-by or perform a sequential import with '-m 1' ". The suggestion describes there are two alternative approaches to this scenario.

Best way is option 2

  1. To specify the number of mappers as 1 (default it takes 4). So by specifying the number of mappers to 1, the task will be sequential and identical to a single threaded task. This will succeed only when you are targeting a small table if in case if you are looking for a large import this will fail as the task tends to run forever.

  2. The best approach is to use split-by where you can specify the number of mappers on the bases of indexed columns or splitting column manually( with queries ).

like image 21
Tony79 Avatar answered Sep 23 '22 00:09

Tony79