Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error running spark on databricks: constructor public XXX is not whitelisted

I was using azure databricks and trying to run some example python code from this page.

and got this exception:

py4j.security.Py4JSecurityException: Constructor public org.apache.spark.ml.classification.LogisticRegression(java.lang.String) is not whitelisted.

Thanks, Lidong

like image 438
lidong Avatar asked Mar 30 '19 03:03

lidong


People also ask

Does Databricks support Spark?

Databricks is a Unified Analytics Platform on top of Apache Spark that accelerates innovation by unifying data science, engineering and business. With our fully managed Spark clusters in the cloud, you can easily provision clusters with just a few clicks.

How does Spark work in Databricks?

Spark provides native bindings for the Java, Scala, Python, and R programming languages. In addition, it includes several libraries to support build applications for machine learning [MLlib], stream processing [Spark Streaming], and graph processing [GraphX]. Apache Spark consists of Spark Core and a set of libraries.

Is Databricks faster than Spark?

In conclusion, Databricks runs faster than AWS Spark in all the performance test. For data reading, aggregation and joining, Databricks is on average 30% faster than AWS and we observed significant runtime difference (Databricks being ~50% faster) in training machine learning models between the two platforms.

Does Databricks run on JVM?

Databricks supports multiple languages but you'll always get the best performance with JVM-based languages. Databricks has a few nice features that makes it ideal for parallelizing data science, unlike leading ETL tools.


1 Answers

This error shows up with some library methods when using High Concurrency cluster with credential pass through enabled. If that is your scenario a work around that may be an option is to use a different cluster mode.

py4j.security.Py4JSecurityException: ... is not whitelisted This exception is thrown when you have accessed a method that Azure Databricks has not explicitly marked as safe for Azure Data Lake Storage credential passthrough clusters. In most cases, this means that the method could allow a user on a Azure Data Lake Storage credential passthrough cluster to access another user’s credentials.

Reference: https://docs.azuredatabricks.net/spark/latest/data-sources/azure/adls-passthrough.html

like image 175
Dustin V Avatar answered Oct 05 '22 09:10

Dustin V