Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sqoop import from MySQL: Decimals always imported into Hive as String

I'm trying to import data from MySQL into Hive (Cloudera 5.8 cluster) with 4 data nodes for HDFS. Using Sqoop 1.4.6 (enclosed in Cloudera 5.8) I've been trying to import directly from MySQL into Hive and it works except..all my Decimal (18,2) fields form MySQL tables (InnoDB format) are always imported as a Hive String data type which is obviously wrong. Here is my Sqoop bash (on CentOS 6.7) script":

sqoop import --null-string '\\N' --null-non-string '\\N' --hive-delims-replacement '\0D' --fields-terminated-by '\001' --connect jdbc:mysql://xxx.xxx.xxx/live_data --username (user_name) --password 'xxxxxx' --table (table name) --hive-import --hive-database (hive_database) --hive-table (table name) --target-dir "(target directory)" --as-parquetfile --hive-overwrite --map-column-hive 'price=DECIMAL\(18%2C2\)'

The map-column-hive seems to be allowed (meaning, no errors, although I've read of fixes in 1.4.7) and I've substituted "%2C" for a comma (as you get an error when you use a "," directly").

I've checked the Hive Decimal Type and a size of 18,2 appears to be supported (cloudera 5.1+). I've tried a number of variations: using double quotes instead of single quites, removing all delimiter specifiers using

("-null-string '\\N' --null-non-string '\\N' --hive-delims-replacement '\0D' --fields-terminated-by '\001'") 

which really helped for XML data another table, mucked up the data before I used those settings).

I've also tried reducing the number of digits (for example 8,2) but no matter what I do I seems to get my decimals from MySQL are all converted into the Hive string type which is terrible for numeric aggregate functions.

I've looked into upgrading my Sqoop to 1.4.7 but I've had errors when running it after compiling which is a separate question. For some reason either my map-column-hive parameter is ignored or isn't working.

Does anyone have an answer how I can get this to properly import from decimal (18,2) into Hive Decimal (18,2) or at least a shorter number with 2 decimal places.

Thanks all!

like image 627
Evolutionary Avatar asked Nov 25 '25 08:11

Evolutionary


1 Answers

Sqoop Import + Parquet + Decimal is broken. It's fixed for avro here: https://issues.apache.org/jira/browse/SQOOP-1493

You can try converting into double using --map-column-hive.

like image 138
Nihhaar Avatar answered Nov 27 '25 22:11

Nihhaar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!