In Python, this is how I would do it.
>>> x
array([10, 9, 8, 7, 6, 5, 4, 3, 2])
>>> x[np.array([3, 3, 1, 8])]
array([7, 7, 9, 2])
This doesn't work in the Scala Spark shell:
scala> val indices = Array(3,2,0)
indices: Array[Int] = Array(3, 2, 0)
scala> val A = Array(10,11,12,13,14,15)
A: Array[Int] = Array(10, 11, 12, 13, 14, 15)
scala> A(indices)
<console>:28: error: type mismatch;
found : Array[Int]
required: Int
A(indices)
The foreach method doesn't work either:
scala> indices.foreach(println(_))
3
2
0
scala> indices.foreach(A(_))
<no output>
What I want is the result of B:
scala> val B = Array(A(3),A(2),A(0))
B: Array[Int] = Array(13, 12, 10)
However, I don't want to hard code it like that because I don't know how long indices is or what would be in it.
Apache Spark / Spark SQL Functions Spark SQL provides a slice () function to get the subset or range of elements from an array (subarray) column of DataFrame and slice function is part of the Spark SQL Array functions group. In this article, I will explain the syntax of the slice () function and it’s usage with a scala example.
Now, let’s use the slice () SQL function to slice the array and get the subset of elements from an array column. val sliceDF = df. withColumn ("languages", slice ( col ("languagesAtSchool"),2,3)) . drop ("languagesAtSchool") sliceDF. printSchema () sliceDF. show (false)
Column slice function takes the first argument as Column of type ArrayType following start of the array index and the number of elements to extract from the array. Like all Spark SQL functions, slice () function returns a org.apache.spark.sql.Column of ArrayType.
Since Spark provides a way to execute the raw SQL, let’s learn how to write the same slicing example using Spark SQL expression. In order to use raw SQL, first, you need to create a table using createOrReplaceTempView (). This creates a temporary view from the Dataframe and this view is available lifetime of current Spark context.
The most concise way I can think of is to flip your mental model and put indices first:
indices map A
And, I would potentially suggest using lift
to return an Option
indices map A.lift
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With