Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does HDFS encrypt or compress the data while storing?

Tags:

hadoop

hdfs

When I put a file into HDFS, for example

$ ./bin/hadoop/dfs -put /source/file input
  • Is the file compressed while storing?
  • Is the file encrypted while storing? Is there a config setting that we can specify to change whether it is encrypted or not?
like image 933
Lazer Avatar asked Sep 19 '11 04:09

Lazer


1 Answers

There is no implicit compression in HDFS. In other words, if you want your data to be compressed, you have to write it that way. If you plan on writing map reduce jobs to process the compressed data, you'll want to use a splittable compression format.

Hadoop can process compressed files and here is a nice article on it. Also, the intermediate and the final MR output can be compressed.

There is a JIRA on 'Transparent compression in HDFS', but I don't see much progress on it.

I don't think there is a separate API for encryption, though you can you use a compression codec for encryption/decryption also. Here are more details about encryption and HDFS.

like image 178
Praveen Sripati Avatar answered Oct 05 '22 12:10

Praveen Sripati