The Null
values are displayed as '\N' when a hive external table is queried.
Below is the sqoop import script:
sqoop import -libjars /usr/lib/sqoop/lib/tdgssconfig.jar,/usr/lib/sqoop/lib/terajdbc4.jar -Dmapred.job.queue.name=xxxxxx \ --connect jdbc:teradata://xxx.xx.xxx.xx/DATABASE=$db,LOGMECH=LDAP --connection-manager org.apache.sqoop.teradata.TeradataConnManager \ --username $user --password $pwd --query "
select col1,col2,col3 from $db.xxx
where \$CONDITIONS" \ --null-string '\N' --null-non-string '\N' \ --fields-terminated-by '\t' --num-mappers 6 \ --split-by job_number \ --delete-target-dir \ --target-dir $hdfs_loc
Please advise what change should be done to the script so that nulls are displayed as nulls when the external hive table is queried.
Sathiyan- Below are my findings after many trials
(--null-string '\N')
property is included during sqoop import, then NULLs are stored as ['\N' for both integer and string columns].In your sqoop script you mentioned --null-string '\N' --null-non-string '\N
which means,
--null-string '\N' = The string to be written for a null value for string columns
--null-non-string '\N' = The string to be written for a null value for non-string columns
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With