Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sqoop import multiple tables

We are using Cloudera CDH 4 and we are able to import tables from our Oracle databases into our HDFS warehouse as expected. The problem is we have 10's of thousands of tables inside our databases and sqoop only supports importing one table at a time.

What options are available for importing multiple tables into HDFS or Hive? For example what would be the best way of importing 200 tables from oracle into HDFS or Hive at a time?

The only solution i have seen so far is to create a sqoop job for each table import and then run them all individually. Since Hadoop is designed to work with large dataset it seems like there should be a better way though.

like image 701
Danny Westfall Avatar asked Jun 19 '13 14:06

Danny Westfall


2 Answers

U can use " import-all-tables " option to load all tables into HDFS at one time .

sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop  --target-dir '/Sqoop21/AllTables'

if we want to exclude some tables to load into hdfs we can use " --exclude-tables " option

Ex:

sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop  --target-dir '/Sqoop21/AllTables'  --exclude-tables <table1>,<tables2>

If we want to store in a specified directory then u can use " --warehouse-dir " option

Ex:

sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop --warehouse-dir '/Sqoop'
like image 163
Kumar Reddy Basapuram Avatar answered Sep 19 '22 20:09

Kumar Reddy Basapuram


  1. Assuming that the sqoop configuration for each table is the same, you can list all the tables you need to import and then iterate over them launching sqoop jobs (ideally launch them asynchronously). You can run the following to fetch the list of tables from Oracle: SELECT owner, table_name FROM dba_tables reference

  2. Sqoop does offer an option to import all tables. Check this link. There are some limitations though.

  3. Modify sqoop source code and recompile it to your needs. The sqoop codebase is well documented and nicely arranged.

like image 29
Jit B Avatar answered Sep 17 '22 20:09

Jit B