I am running a very simple Spark (2.4.0 on Databricks) ML script:
from pyspark.ml.clustering import LDA
lda = LDA(k=10, maxIter=100).setFeaturesCol('features')
model = lda.fit(dataset)
But received following error:
IllegalArgumentException: 'requirement failed: Column features must be of type equal to one of the following types: [struct<type:tinyint,size:int,indices:array<int>,values:array<double>>, array<double>, array<float>] but was actually of type array<double>.'
Why my array<double>
is not an array<double>
?
Here is the schema:
root
|-- BagOfWords: struct (nullable = true)
| |-- indices: array (nullable = true)
| | |-- element: long (containsNull = true)
| |-- size: long (nullable = true)
| |-- type: long (nullable = true)
| |-- values: array (nullable = true)
| | |-- element: double (containsNull = true)
|-- tokens: array (nullable = true)
| |-- element: string (containsNull = true)
|-- features: array (nullable = true)
| |-- element: double (containsNull = true)
You probably need to convert it into vector form using vector assembler
from pyspark.ml.feature import VectorAssembler
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With