I have a df NAMES in which if I output via display(NAMES)
:
NAMES
John
Sarah
Michael
Sean
I also have a list students, print(students)
:
{John, Alan, Andy}
Question:
Based on this list (students), how can I loop through the df with "NAMES" Column and output to another list the names of students who are in the list and also in the DF.
Expected output of list: "John"
I have tried
list2 = []
for i in NAMES:
for g in students:
if i == g:
list2.append(i)
but i end up with an error, how can i implement this via pyspark?
Thanks.
In general looping through data in pyspark
will not be very efficient. When possible use native pyspark
functions. For your specific question you can use the filter
function that will filter your DataFrame by names in the student list:
df_names.filter(col("name").isin(students)).select("name")
In your example the only return value will be John.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With