I have a list that is generated by a function. when I execute print
on my list:
print(preds_labels)
I obtain:
[(0.,8.),(0.,13.),(0.,19.),(0.,19.),(0.,19.),(0.,19.),(0.,19.),(0.,20.),(0.,21.),(0.,23.)]
but when I want to create a DataFrame
with this command:
df = sqlContext.createDataFrame(preds_labels, ["prediction", "label"])
I get an error message:
not supported type: type 'numpy.float64'
If I create the list manually, I have no problem. Do you have an idea?
DataFrame. withColumn (colName, col)[source] Returns a new DataFrame by adding a column or replacing the existing column that has the same name. The column expression must be an expression over this DataFrame ; attempting to add a column from some other DataFrame will raise an error.
pyspark uses its own type system and unfortunately it doesn't deal with numpy well. It works with python types though. So you could manually convert the numpy.float64
to float
like
df = sqlContext.createDataFrame(
[(float(tup[0]), float(tup[1]) for tup in preds_labels],
["prediction", "label"]
)
Note pyspark will then take them as pyspark.sql.types.DoubleType
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With