So I need to create an array of numbers enumerating from 1 to 100 as the value for each row as an extra column.
Using the array()
function with a bunch of literal values works, but surely there's a way to use / convert a Scala Range(a to b)
instead of listing each number individually?
spark.sql("SELECT key FROM schema.table")
.otherCommands
.withColumn("range", array(lit(1), lit(2), ..., lit(100)))
To something like:
withColumn("range", array(1 to 100))
From Spark 2.4 you can use [sequence][1] function If you have this dataframe:
df.show()
+--------+
|column_1|
+--------+
| 1|
| 2|
| 3|
| 0|
+--------+
If you use the sequence function from 0 to column_1 you got this:
df.withColumn("range", sequence(lit(0), col("column_1"))).show()
+--------+------------+
|column_1| range|
+--------+------------+
| 1| [0, 1]|
| 2| [0, 1, 2]|
| 3|[0, 1, 2, 3]|
| 0| [0]|
+--------+------------+
For this case, set both values with lit
:
df.withColumn("range", sequence(lit(0), lit(100)))
You can use map
function using lit
inbuilt function inside array
function as
df.withColumn("range", array((1 to 100).map(lit(_)): _*))
For Spark 2.2+ a new function typedLit
was introduced that easily solves this problem without using .map(lit(_))
on the array. From the documentation:
The difference between this function and lit is that this function can handle parameterized scala types e.g.: List, Seq and Map.
Use as follows:
import org.apache.spark.sql.functions.typedLit
df.withColumn("range", typedLit((1 to 100).toList))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With