Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sqoop import as OrC file

Tags:

rdbms

hdfs

sqoop

Is there any option in sqoop to import data from RDMS and store it as ORC file format in HDFS?

Alternatives tried: imported as text format and used a temp table to read input as text file and write to hdfs as orc in hive

like image 303
Rajashekar Reddy Peta Avatar asked Apr 30 '15 21:04

Rajashekar Reddy Peta


1 Answers

At least in Sqoop 1.4.5 there exists hcatalog integration that support orc file format (amongst others).

For example you have the option

--hcatalog-storage-stanza

which can be set to

stored as orc tblproperties ("orc.compress"="SNAPPY")

Example:

sqoop import 
 --connect jdbc:postgresql://foobar:5432/my_db 
 --driver org.postgresql.Driver 
 --connection-manager org.apache.sqoop.manager.GenericJdbcManager 
 --username foo 
 --password-file hdfs:///user/foobar/foo.txt 
 --table fact 
 --hcatalog-home /usr/hdp/current/hive-webhcat 
 --hcatalog-database my_hcat_db 
 --hcatalog-table fact 
 --create-hcatalog-table 
 --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'
like image 156
selle Avatar answered Jan 03 '23 17:01

selle