Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is StringIndexer , VectorIndexer, and how to use them?

Dataset<Row> dataFrame = ... ;   
StringIndexerModel labelIndexer = new StringIndexer()
               .setInputCol("label")
               .setOutputCol("indexedLabel")
               .fit(dataFrame);

 VectorIndexerModel featureIndexer = new VectorIndexer()
               .setInputCol("s")
               .setOutputCol("indexedFeatures")
               .setMaxCategories(4)
               .fit(dataFrame);
IndexToString labelConverter = new IndexToString()
               .setInputCol("prediction")
               .setOutputCol("predictedLabel")
               .setLabels(labelIndexer.labels());

What is StringIndexer, VectorIndexer, IndexToString and what is the difference between them? How and When should I use them?

like image 899
Manikandan Balasubramanian Avatar asked May 26 '17 07:05

Manikandan Balasubramanian


1 Answers

I know only about those two:

StringIndexer and VectorIndexer

StringIndexer:

  • converts a single column to an index column (similar to a factor column in R)

VectorIndexer:

  • is used to index categorical predictors in a featuresCol column. Remember that featuresCol is a single column consisting of vectors (refer to featuresCol and labelCol). Each row is a vector which contains values from each predictors.
  • if you have string type predictors, you will first need to use index those columns with StringIndexer. featuresCol contains vectors, and vectors does not contain string values.

Take a look here for example: https://mingchen0919.github.io/learning-apache-spark/StringIndexer-and-VectorIndexer.html

like image 51
Amit Haim Avatar answered Sep 18 '22 21:09

Amit Haim