Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Securing Parquet Files Column-wise

I have been looking for a way to secure Parquet files, column-wise, for Spark access. Ideally, that would work the same way Apache Ranger works for Hive, i.e., a Sysadmin defines the access policies for different groups and columns.

I have been trying Ranger through Hortoworks HDP, however, it seems that plug-ins for Spark and Parquet are not there yet.

I have also been able to devise a solution using Apache Drill and views, however, it is not acceptable right now mainly because of the still scarce community support for Drill.

Has anyone faced the same requirement and/or have some directions for a solution?

like image 830
Felipe Martins Melo Avatar asked Sep 18 '25 09:09

Felipe Martins Melo


1 Answers

After a great deal of research, I've come to a conclusion that this is not possible.

The way Ranger works with other tools (HDFS, Hive, HBase, etc) is by using plug-ins that implements hooks provided by those tools. For instance, to create a custom plug-in to secure Hive, one needs to create a HiveAuthorizer through the HiveAuthorizerFactory. But there's no such a hook for Parquet as it is nothing more than a file format.

A possible solution that would allow to secure Parquet files at a column-wise level from Ranger is to create an extension for Ranger's HDFS plugin. This extension would implement the access rules for Parquet files defined through Ranger. That way, we could seamlessly secure Parquet files the same way we do for Hive or HBase as long as the files are stored in HDFS.

like image 70
Felipe Martins Melo Avatar answered Sep 20 '25 02:09

Felipe Martins Melo