In sqoop's perspective what is the difference between importing a relational table as a sequence file like-
sqoop import --connect connectionString \
--username userName –P --table tableName \
--as-sequencefile
and importing it as a avrodata file like-
sqoop import --connect connectionString \
--username userName –P --table tableName \
--as-avrodatafile
What is the actual difference between sequence file and avrodata file?
SequenceFiles are a binary format that store individual records in custom record-specific data types. This format supports exact storage of all data in binary representations, and is appropriate for storing binary data (for example, VARBINARY columns), or data that will be principly manipulated by custom MapReduce programs (reading from SequenceFiles is higher-performance than reading from text files, as records do not need to be parsed).
Avro data files are a compact, efficient binary format that provides interoperability with applications written in other programming languages. Avro also supports versioning, so that when, e.g., columns are added or removed from a table, previously imported data files can be processed along with new ones.
here's a comparison, by Doug Cutting himself:
http://www.quora.com/What-are-the-advantages-of-Avros-object-container-file-format-over-the-SequenceFile-container-format
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With