Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advantages of Sequence file over hdfs textfile

What is the advantage of Hadoop Sequence File over HDFS flat file(Text)? In what way Sequence file is efficient?

Small files can be combined and written into a sequence file, but the same can be done for a HDFS text file also. Need to know the difference between the two ways. I have been googling about this for a while, would be helpful if i get clarity on this?

like image 832
hrkrshn Avatar asked Aug 02 '12 13:08

hrkrshn


2 Answers

  1. Sequence files are appropriate for situations in which you want to store keys and their corresponding values. For text files you can do that but you have to parse each line.
  2. Can be compressed and still be splittable which means better workload. You can't split a compressed text file unless you use a splittable compression format.
  3. Can be approached as binary files => more storage efficient. In a text file a double will be a number of chars => large storage overhead.
like image 118
Razvan Avatar answered Nov 15 '22 16:11

Razvan


Advantages of Hadoop Sequence files ( As per Siva's article from hadooptutorial.info website)

  1. More compact than text files
  2. Provides support for compression at different levels - Block or Record etc.
  3. Files can be split and processed in parallel
  4. They can solve large number of small files problem in Hadoop where Hadoop main advantage is processing large file with Map reduce jobs. It can be used as a container for large number of small files
  5. Temporary output of Mapper can be stored in sequential files

Disadvantages:

  1. Sequential files are append only
like image 37
Ravindra babu Avatar answered Nov 15 '22 16:11

Ravindra babu