I'm currently importing from Mysql into HDFS using Sqoop in avro format, this works great. However what's the best way to load these files into HIVE?
Since avro files contain the schema I can pull the files down to the local file system, use avro tools and create the table with the extracted schema but this seems excessive?
Also if a column is dropped from a table in mysql can I still load the old files into a new HIVE table created with the new avro schema (dropped column missing)?
After version 9.1, Hive has come packaged with an Avro Hive SerDe. This allows Hive to read from Avro files directly while Avro still "owns" the schema.
For you second question, you can define the Avro schema with column defaults. When you add a new column just make sure to specify a default, and all your old Avro files will work just find in a new Hive table.
To get started, you can find the documentation here and the book Programming Hive (available on Safari Books Online) has a section on the Avro HiveSerde which you might find more readable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With