I have the following element:
a = Row(ts=1465326926253, myid=u'1234567', mytype=u'good')
The Row is of Spark data frame Row class. I want to append a new field to a
, so that a
would look like:
a = Row(ts=1465326926253, myid=u'1234567', mytype=u'good', name = u'john')
In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .
You can do update a PySpark DataFrame Column using withColum(), select() and sql(), since DataFrame's are distributed immutable collection you can't really change the column values however when you change the value using withColumn() or any approach, PySpark returns a new Dataframe with updated values.
Here is an updated answer that works. First you have to create a dictionary then update the dict and then write it out to a pyspark Row.
Code is as follows:
from pyspark.sql import Row
#Creating the pysql row
row = Row(field1=12345, field2=0.0123, field3=u'Last Field')
#Convert to python dict
temp = row.asDict()
#Do whatever you want to the dict. Like adding a new field or etc.
temp["field4"] = "it worked!"
# Save or output the row to a pyspark rdd
output = Row(**temp)
#How it looks
output
In [1]:
Row(field1=12345, field2=0.0123, field3=u'Last Field', field4='it worked!')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With