Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sqoop import Null string

The Null values are displayed as '\N' when a hive external table is queried.

Below is the sqoop import script:

sqoop import -libjars /usr/lib/sqoop/lib/tdgssconfig.jar,/usr/lib/sqoop/lib/terajdbc4.jar -Dmapred.job.queue.name=xxxxxx \ --connect jdbc:teradata://xxx.xx.xxx.xx/DATABASE=$db,LOGMECH=LDAP --connection-manager org.apache.sqoop.teradata.TeradataConnManager \ --username $user --password $pwd --query "

select col1,col2,col3 from $db.xxx

where \$CONDITIONS" \ --null-string '\N' --null-non-string '\N' \ --fields-terminated-by '\t' --num-mappers 6 \ --split-by job_number \ --delete-target-dir \ --target-dir $hdfs_loc

Please advise what change should be done to the script so that nulls are displayed as nulls when the external hive table is queried.

like image 982
Bagavathi Avatar asked Oct 27 '16 17:10

Bagavathi


2 Answers

Sathiyan- Below are my findings after many trials

  1. If (null string) property is not included during sqoop import, then NULLs are stored as [blank for integer columns] and [blank for string columns] in HDFS. 2.If the HIVE table on top of HDFS is queried, we would see [NULL for integer column] and [blank for String columns]
  2. If the (--null-string '\N') property is included during sqoop import, then NULLs are stored as ['\N' for both integer and string columns].
  3. If the HIVE table on top of HDFS is queried, we would see [NULL for both integer and string columns not '\N']
like image 59
Bagavathi Avatar answered Sep 22 '22 05:09

Bagavathi


In your sqoop script you mentioned --null-string '\N' --null-non-string '\N which means,

--null-string '\N'  = The string to be written for a null value for string columns 

--null-non-string '\N' = The string to be written for a null value for non-string columns
like image 38
Sathiyan S Avatar answered Sep 25 '22 05:09

Sathiyan S