How to access values in array column?


I have a Dataframe with one column. Each row of that column has an Array of String values:

Values in my Spark 2.2 Dataframe

["123", "abc", "2017", "ABC"] ["456", "def", "2001", "ABC"] ["789", "ghi", "2017", "DEF"]  org.apache.spark.sql.DataFrame = [col: array]  root |-- col: array (nullable = true) |    |-- element: string (containsNull = true) 

What is the best way to access elements in the array? For example, I would like extract distinct values in the fourth element for the year 2017 (answer "ABC", "DEF").

2 Answers

Since Spark 2.4.0, there is a new function element_at($array_column, $index).

See Spark docs

 df.where($"col".getItem(2) === lit("2017")).select($"col".getItem(3)) 

see getItem from https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column

