Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter Spark dataframe if one column is a member of another column

I have a dataframe with two columns(one string and one array of string):

root
 |-- user: string (nullable = true)
 |-- users: array (nullable = true)
 |    |-- element: string (containsNull = true)

How can I filter the dataframe so that the result dataframe only contains rows that user is in users?

like image 906
Rainfield Avatar asked Dec 02 '22 14:12

Rainfield


1 Answers

Quick and simple:

import org.apache.spark.sql.functions.expr

df.where(expr("array_contains(users, user)")
like image 82
zero323 Avatar answered Dec 21 '22 07:12

zero323