Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hive-drop-import-delims not removing newline while using HCatalog in Sqoop

Sqoop while used with HCatalog import not able to remove new line (\n) from column data even after using --hive-drop-import-delims option in the command when running Apache Sqoop with Oracle.

Sqoop Query:

    sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

Data in Oracle Column col4 as below: (Data has control characters such as ^M)

<li>Details:^M
    <ul>^M
        <li>

Does Control character causing this problem?

Am I missing anything ? Is there any workaround or solution for this problem?

like image 365
Suraj Nayak Avatar asked Jan 21 '15 20:01

Suraj Nayak


2 Answers

Use --map-column-java option to explicitly state the column is of type String. Then --hive-drop-import-delims works as expected (to remove \n from data).

Changed Sqoop Command :

sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string "" --map-column-java col4=String
like image 115
Suraj Nayak Avatar answered Sep 22 '22 05:09

Suraj Nayak


sqoop import \
--connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username 123 \
--password 123 \
--table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 \
--hive-delims-replacement "anything" \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

You can try this --hive-delims-replacement "anything" this will replace all \n , \t , and \01 characters with the string you provided(in this case replace with string "anything").

like image 32
bunty Avatar answered Sep 25 '22 05:09

bunty