Spark 2.4 introduced new useful Spark SQL functions involving arrays but I was a little bit puzzled when I find out that the result of:
select array_remove(array(1, 2, 3, null, 3), null)
is null
and not [1, 2, 3, 3].
Is this an expected behavior? Is it possible to remove nulls using array_remove
?
As a side note, for now the alternative I am using is a higher order function in databricks:
select filter(array(1, 2, 3, null, 3), x -> x is not null)
In order to remove Rows with NULL values on selected columns of PySpark DataFrame, use drop(columns:Seq[String]) or drop(columns:Array[String]). To these functions pass the names of the columns you wanted to check for NULL values to delete rows.
agg is a DataFrame method that accepts those aggregate functions as arguments: scala> my_df.agg(min("column")) res0: org.apache.spark.sql. DataFrame = [min(column): double]
To answer your first question, Is this an expected behavior? , Yes. Because the official notebook(https://docs.databricks.com/_static/notebooks/apache-spark-2.4-functions.html) points out "Remove all elements that equal to the given element from the given array." and
NULL
corresponds to undefined values & the results will also not defined.
So,I think NULL
s are out of the purview of this function.
Better you found out a way to overcome this, you can also use spark.sql("""SELECT array_except(array(1, 2, 3, 3, null, 3, 3,3, 4, 5), array(null))""").show()
but the downside is that the result will be without duplicates.
You can do something like this in Spark 2:
import org.apache.spark.sql.functions._
import org.apache.spark.sql._
/**
* Array without nulls
* For complex types, you are responsible for passing in a nullPlaceholder of the same type as elements in the array
*/
def non_null_array(columns: Seq[Column], nullPlaceholder: Any = "רכוב כל יום"): Column =
array_remove(array(columns.map(c => coalesce(c, lit(nullPlaceholder))): _*), nullPlaceholder)
In Spark 3, there is new array filter function and you can do:
df.select(filter(col("array_column"), x => x.isNotNull))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With