Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create new column with an array of range of numbers

So I need to create an array of numbers enumerating from 1 to 100 as the value for each row as an extra column.

Using the array() function with a bunch of literal values works, but surely there's a way to use / convert a Scala Range(a to b) instead of listing each number individually?

spark.sql("SELECT key FROM schema.table")
  .otherCommands
  .withColumn("range", array(lit(1), lit(2), ..., lit(100)))

To something like:

withColumn("range", array(1 to 100))
like image 862
ChiMo Avatar asked Jul 04 '18 01:07

ChiMo


3 Answers

From Spark 2.4 you can use [sequence][1] function If you have this dataframe:

df.show()
+--------+
|column_1|
+--------+
|       1|
|       2|
|       3|
|       0|
+--------+

If you use the sequence function from 0 to column_1 you got this:

df.withColumn("range", sequence(lit(0), col("column_1"))).show()
+--------+------------+
|column_1|       range|
+--------+------------+
|       1|      [0, 1]|
|       2|   [0, 1, 2]|
|       3|[0, 1, 2, 3]|
|       0|         [0]|
+--------+------------+

For this case, set both values with lit:

df.withColumn("range", sequence(lit(0), lit(100)))
like image 91
Luis A.G. Avatar answered Nov 20 '22 20:11

Luis A.G.


You can use map function using lit inbuilt function inside array function as

df.withColumn("range", array((1 to 100).map(lit(_)): _*))
like image 37
Ramesh Maharjan Avatar answered Nov 20 '22 21:11

Ramesh Maharjan


For Spark 2.2+ a new function typedLit was introduced that easily solves this problem without using .map(lit(_)) on the array. From the documentation:

The difference between this function and lit is that this function can handle parameterized scala types e.g.: List, Seq and Map.

Use as follows:

import org.apache.spark.sql.functions.typedLit

df.withColumn("range", typedLit((1 to 100).toList))
like image 3
Shaido Avatar answered Nov 20 '22 20:11

Shaido