Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark: How to save and apply IndexToString to convert labels back to original values in a new predicted dataset

I am using pyspark.ml.RandomForestClassifier and one of the steps here involves StringIndexer on the training data target variable to convert it into labels.

indexer = StringIndexer(inputCol = target_variable_name, outputCol = 'label').fit(df)
df = indexer.transform(df)

After fitting the final model I am saving it using mlflow.spark.log_model(). So, when applying the model on a new dataset in future, I just load the model again and apply to the new data:

model = mlflow.sklearn.load_model("models:/RandomForest_model/None")
predictions = rfModel.transform(new_data)

In the new_data the prediction will come as labels and not in original value. So, if I have to get the original values I have to use IndexToString

labelConverter = IndexToString(inputCol="prediction", outputCol="predictedLabel",labels=indexer.labels)
predictions = labelConverter.transform(predictions)

So, the question is, my model doesn't save the indexer.labels as only the model gets saved. How do, I save and use the indexer.labels from my training dataset on any new dataset. Can this be saved and retrived in mlflow ?

Apologies, if Iam sounding naïve here . But, getting back the original values in the new dataset is really getting me confused.

like image 621
Deb Avatar asked Dec 07 '25 02:12

Deb


1 Answers

StringIndexerModel is a model fitted by StringIndexer.

What you can do is saving to disk

from pyspark.ml.feature import StringIndexer, StringIndexerModel
indexer = indexer = StringIndexer(inputCol = target_variable_name, outputCol = 'label').fit(df)
indexer.save("string_indexer")
indexer = StringIndexerModel.load("string_indexer")

or logging to mlflow

import mlflow
mlflow.spark.log_model(indexer, "string_indexer")
logged_model = 'runs:/you_run_id/string_indexer'
indexer = mlflow.spark.load_model(logged_model)

Hope this helps.

like image 63
Felix Kemeth Avatar answered Dec 08 '25 21:12

Felix Kemeth



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!