I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?
In the end, I settled for the following :
Drop:
df.drop('column_1', 'column_2', 'column_3')
Select :
df.select([c for c in df.columns if c not in {'column_1', 'column_2', 'column_3'}])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With