The Cloudera documentation says that Hadoop does not support on disk encryption. Would it be possible to use hardware encrypted hard drives with Hadoop?
Hadoop provides several ways to encrypt stored data. The lowest level of encryption is volume encryption, which protects data after physical theft or accidental loss of a disk volume. The entire volume is encrypted; this approach does not support finer-grained encryption of specific files or directories.
HDFS data at rest encryption allows data to be stored in encrypted HDFS directories called encryption zones. All files within an encryption zone are transparently encrypted and decrypted on the client side, meaning decrypted data is never stored in HDFS.
eCryptfs can be used to do per-file encryption on each individual Hadoop node. It's rather tedious to setup, but it certainly can be done.
Gazzang offers a turnkey commercial solution built on top of eCryptfs to secure "big data" through encryption, and partners with several of the Hadoop and NoSQL vendors.
Gazzang's cloud-based Encryption Platform for Big Data helps organizations transparently encrypt data stored in the cloud or on premises, using advanced key management and process-based access control lists, and helping meet security and compliance requirements.
Full disclosure: I am one of authors and current maintainers of eCryptfs. I am also Gazzang's Chief Architect and a lead developer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With