Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Column of List to Dataframe

I have a column of lists in a spark dataframe.

+-----------------+
|features         |
+-----------------+
|[0,45,63,0,0,0,0]|
|[0,0,0,85,0,69,0]|
|[0,89,56,0,0,0,0]|
+-----------------+

How do I convert that to a spark dataframe where each element in the list is a column in the dataframe? We can assume that the lists will be the same size.

For Example,

+--------------------+
|c1|c2|c3|c4|c5|c6|c7|
+--------------------+
|0 |45|63|0 |0 |0 |0 |
|0 |0 |0 |85|0 |69|0 |
|0 |89|56|0 |0 |0 |0 |
+--------------------+
like image 460
Bryce Ramgovind Avatar asked Sep 10 '25 18:09

Bryce Ramgovind


1 Answers

What you describe is actually the invert of the VectorAssembler operation.

You can do it by converting to an intermediate RDD, as follows:

spark.version
# u'2.2.0'

# your data:
df.show(truncate=False)
# +-----------------+ 
# |        features | 
# +-----------------+
# |[0,45,63,0,0,0,0]|
# |[0,0,0,85,0,69,0]|
# |[0,89,56,0,0,0,0]|
# +-----------------+ 

dimensionality = 7
out = df.rdd.map(lambda x: [float(x[0][i]) for i in range(dimensionality)]).toDF(schema=['c'+str(i+1) for i in range(dimensionality)])
out.show()
# +---+----+----+----+---+----+---+ 
# | c1|  c2|  c3|  c4| c5|  c6| c7|
# +---+----+----+----+---+----+---+ 
# |0.0|45.0|63.0| 0.0|0.0| 0.0|0.0|
# |0.0| 0.0| 0.0|85.0|0.0|69.0|0.0| 
# |0.0|89.0|56.0| 0.0|0.0| 0.0|0.0| 
# +---+----+----+----+---+----+---+
like image 97
desertnaut Avatar answered Sep 13 '25 10:09

desertnaut



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!