Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sqoop, Avro and Hive

Tags:

hive

avro

sqoop

I'm currently importing from Mysql into HDFS using Sqoop in avro format, this works great. However what's the best way to load these files into HIVE?

Since avro files contain the schema I can pull the files down to the local file system, use avro tools and create the table with the extracted schema but this seems excessive?

Also if a column is dropped from a table in mysql can I still load the old files into a new HIVE table created with the new avro schema (dropped column missing)?

like image 859
Andrew Stevenson Avatar asked Jan 24 '26 03:01

Andrew Stevenson


1 Answers

After version 9.1, Hive has come packaged with an Avro Hive SerDe. This allows Hive to read from Avro files directly while Avro still "owns" the schema.

For you second question, you can define the Avro schema with column defaults. When you add a new column just make sure to specify a default, and all your old Avro files will work just find in a new Hive table.

To get started, you can find the documentation here and the book Programming Hive (available on Safari Books Online) has a section on the Avro HiveSerde which you might find more readable.

like image 87
Daniel Koverman Avatar answered Jan 27 '26 01:01

Daniel Koverman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!