How to overwrite Spark ML model in PySpark?

Tags:

from pyspark.ml.regression import RandomForestRegressionModel

rf = RandomForestRegressor(labelCol="label",featuresCol="features", numTrees=5, maxDepth=10, seed=42)
rf_model = rf.fit(train_df)
rf_model_path = "./hdfsData/" + "rfr_model"
rf_model.save(rf_model_path)

When I first tried to save the model, these lines worked. But when I want to save the model into the path again, it gave this error:

Py4JJavaError: An error occurred while calling o1695.save. : java.io.IOException: Path ./hdfsData/rfr_model already exists. Please use write.overwrite().save(path) to overwrite it.

Then I tried:

rf_model.write.overwrite().save(rf_model_path)

It gave:

AttributeError: 'function' object has no attribute 'overwrite'

It seems the pyspark.mllib module gives the overwrite function but not pyspark.ml module. Anyone knows how to resolve this if I want to overwrite the old model with the new model? Thanks.

428

asked Feb 17 '17 17:02

Veronica Wenqian Cheng

1 Answers

The message you see is a Java error message, not a Python one. You should call the write method first:

rf_model.write().overwrite().save(rf_model_path)

179

answered Nov 10 '22 08:11

zero323

Related questions
                            
                                Can I write a plain text HDFS (or local) file from a Spark program, not from an RDD?
                            
                                Akka Stream vs Spark Stream [closed]
                            
                                How to query the column names of a Spark Dataset?
                            
                                Spark 2.0.0 Error: PartitioningCollection requires all of its partitionings have the same numPartitions
                            
                                SparklyR removing a Table from Spark Context
                            
                                How to Setup SPARK_HOME variable?
                            
                                How to load CSV file with records on multiple lines?
                            
                                Creating a simple 1-row Spark DataFrame with Java API
                            
                                How to use LEFT and RIGHT keyword in SPARK SQL
                            
                                Filtering rows with empty arrays in PySpark
                            
                                Spark read s3 using sc.textFile("s3a://bucket/filePath"). java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager
                            
                                DataFrame columns names conflict with .(dot)
                            
                                How to make it easier to deploy my Jar to Spark Cluster in standalone mode?
                            
                                Spark : How to use mapPartition and create/close connection per partition
                            
                                Why does conf.set("spark.app.name", appName) not set the name in the UI?
                            
                                spark - scala: not a member of org.apache.spark.sql.Row
                            
                                calculating percentages on a pyspark dataframe
                            
                                SparkSQL and explode on DataFrame in Java
                            
                                Pyspark dataframe how to drop rows with nulls in all columns?
                            
                                Spark Select with a List of Columns Scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to overwrite Spark ML model in PySpark?

Tags:

machine-learning

apache-spark

pyspark

apache-spark-ml

apache-spark-mllib

Veronica Wenqian Cheng

People also ask

1 Answers

zero323

Recent Activity

Donate For Us