Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between Avrodata file and Sequence file with respect to Apache sqoop

In sqoop's perspective what is the difference between importing a relational table as a sequence file like-

sqoop import --connect connectionString \  
--username userName  –P --table tableName \ 
--as-sequencefile

and importing it as a avrodata file like-

sqoop import --connect connectionString \  
--username userName  –P --table tableName \ 
--as-avrodatafile

What is the actual difference between sequence file and avrodata file?

like image 266
SparkOn Avatar asked Dec 09 '22 07:12

SparkOn


1 Answers

SequenceFiles are a binary format that store individual records in custom record-specific data types. This format supports exact storage of all data in binary representations, and is appropriate for storing binary data (for example, VARBINARY columns), or data that will be principly manipulated by custom MapReduce programs (reading from SequenceFiles is higher-performance than reading from text files, as records do not need to be parsed).

Avro data files are a compact, efficient binary format that provides interoperability with applications written in other programming languages. Avro also supports versioning, so that when, e.g., columns are added or removed from a table, previously imported data files can be processed along with new ones.

here's a comparison, by Doug Cutting himself:

http://www.quora.com/What-are-the-advantages-of-Avros-object-container-file-format-over-the-SequenceFile-container-format

like image 50
dpsdce Avatar answered May 16 '23 07:05

dpsdce