I found PySpark has a method called drop
but it seems it can only drop one column at a time. Any ideas about how to drop multiple columns at the same time?
df.drop(['col1','col2'])
TypeError Traceback (most recent call last) <ipython-input-96-653b0465e457> in <module>() ----> 1 selectedMachineView = machineView.drop([['GpuName','GPU1_TwoPartHwID']]) /usr/hdp/current/spark-client/python/pyspark/sql/dataframe.pyc in drop(self, col) 1257 jdf = self._jdf.drop(col._jc) 1258 else: -> 1259 raise TypeError("col should be a string or a Column") 1260 return DataFrame(jdf, self.sql_ctx) 1261 TypeError: col should be a string or a Column
In pyspark the drop() function can be used to remove values/columns from the dataframe.
In PySpark 2.1.0 method drop
supports multiple columns:
PySpark 2.0.2:
DataFrame.drop(col)
PySpark 2.1.0:
DataFrame.drop(*cols)
Example:
df.drop('col1', 'col2')
or using the *
operator as
df.drop(*['col1', 'col2'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With