Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I pass a list of columns to select in pyspark dataframe?

I have list column names.

columns = ['home','house','office','work']

and I would like to pass that list values as columns name in "select" dataframe.

I have tried it...

df_tables_full = df_tables_full.select('time_event','kind','schema','table',columns)

but I have received error below..

TypeError: Invalid argument, not a string or column: ['home', 'house', 'office',
'work'] of type <class 'list'>. For column literals, use 'lit', 'array', 'struct' 
or 'create_map' function.

Can you have any ideia? Thank you guys!

like image 278
Diogenes Avatar asked Mar 20 '20 19:03

Diogenes


1 Answers

Use * before columns to unnest columns list and use in .select.

columns = ['home','house','office','work']

#select the list of columns
df_tables_full.select('time_event','kind','schema','table',*columns).show()

df_tables_full = df_tables_full.select('time_event','kind','schema','table',*columns)
like image 92
notNull Avatar answered Sep 21 '22 13:09

notNull