I have a dynamic list which is created based on value of n. <pre class="prettyprint lang-py prettyprint-override"><code>n = 3 drop_lst = ['a' + str(i) for i in range(n)] df.drop(drop_lst) </code></pre> But the above is not working. Note: My use case requires a dynamic list. If I just do the below without list it works <pre class="prettyprint lang-py prettyprint-override"><code>df.drop('a0','a1','a2') </code></pre> How do I make drop function work with list? Spark 2.2 doesn't seem to have this capability. Is there a way to make it work without using <code>select()</code>?

You can use the <code>*</code> operator to pass the contents of your list as arguments to <code>drop()</code>: <pre class="prettyprint lang-py prettyprint-override"><code>df.drop(*drop_lst) </code></pre>

You can give column name as comma separated list e.g. <pre class="prettyprint"><code>df.drop("col1","col11","col21") </code></pre>

This is how drop specified number of consecutive columns in scala: <pre class="prettyprint"><code>val ll = dfwide.schema.names.slice(1,5) dfwide.drop(ll:_*).show </code></pre> slice take two parameters star index and end index.

How to drop multiple column names given in a list from Spark DataFrame?

Tags:

dataframe

apache-spark

apache-spark-sql

pyspark

pyspark-sql

I have a dynamic list which is created based on value of n.

n = 3
drop_lst = ['a' + str(i) for i in range(n)]
df.drop(drop_lst)

But the above is not working.

Note:

My use case requires a dynamic list.

If I just do the below without list it works

df.drop('a0','a1','a2')

How do I make drop function work with list?

Spark 2.2 doesn't seem to have this capability. Is there a way to make it work without using select()?

387

asked Dec 15 '17 10:12

GeorgeOfTheRF

3 Answers

You can use the * operator to pass the contents of your list as arguments to drop():

df.drop(*drop_lst)

answered Sep 30 '22 18:09

mtoto

You can give column name as comma separated list e.g.

df.drop("col1","col11","col21")

answered Sep 30 '22 17:09

vaquar khan

This is how drop specified number of consecutive columns in scala:

val ll = dfwide.schema.names.slice(1,5)
dfwide.drop(ll:_*).show

slice take two parameters star index and end index.