Let's say I have a numpy array a
that contains the numbers 1-10:[1 2 3 4 5 6 7 8 9 10]
I also have a Spark dataframe to which I want to add my numpy array a
. I figure that a column of literals will do the job. This doesn't work:
df = df.withColumn("NewColumn", F.lit(a))
Unsupported literal type class java.util.ArrayList
But this works:
df = df.withColumn("NewColumn", F.lit(a[0]))
How to do it?
Example DF before:
col1 |
---|
a b c d e f g h i j |
Expected result:
col1 | NewColumn |
---|---|
a b c d e f g h i j | 1 2 3 4 5 6 7 8 9 10 |
The PySpark SQL functions lit() are used to add a new column to the DataFrame by assigning a literal or constant value.
Create PySpark ArrayType You can create an instance of an ArrayType using ArraType() class, This takes arguments valueType and one optional argument valueContainsNull to specify if a value can accept null, by default it takes True. valueType should be a PySpark type that extends DataType class.
array
a = [1,2,3,4,5,6,7,8,9,10] df = spark.createDataFrame([['a b c d e f g h i j '],], ['col1']) df = df.withColumn("NewColumn", F.array([F.lit(x) for x in a])) df.show(truncate=False) df.printSchema() # +--------------------+-------------------------------+ # |col1 |NewColumn | # +--------------------+-------------------------------+ # |a b c d e f g h i j |[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]| # +--------------------+-------------------------------+ # root # |-- col1: string (nullable = true) # |-- NewColumn: array (nullable = false) # | |-- element: integer (containsNull = false)
@pault commented (Python 2.7):
You can hide the loop using
map
:df.withColumn("NewColumn", F.array(map(F.lit, a)))
@ abegehr added Python 3 version:
df.withColumn("NewColumn", F.array(*map(F.lit, a)))
udf
# Defining UDF def arrayUdf(): return a callArrayUdf = F.udf(arrayUdf, T.ArrayType(T.IntegerType())) # Calling UDF df = df.withColumn("NewColumn", callArrayUdf())
Output is the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With