Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark : select specific column with its position

I would like to know how to select a specific column with its number but not with its name in a dataframe ?

Like this in Pandas:

df = df.iloc[:,2]

It's possible ?

like image 229
Laurent Cesaro Avatar asked Jun 18 '18 13:06

Laurent Cesaro


1 Answers

You can always get the name of the column with df.columns[n] and then select it:

df = spark.createDataFrame([[1,2], [3,4]], ['a', 'b'])

To select column at position n:

n = 1
df.select(df.columns[n]).show()
+---+                                                                           
|  b|
+---+
|  2|
|  4|
+---+

To select all but column n:

n = 1

You can either use drop:

df.drop(df.columns[n]).show()
+---+
|  a|
+---+
|  1|
|  3|
+---+

Or select with manually constructed column names:

df.select(df.columns[:n] + df.columns[n+1:]).show()
+---+
|  a|
+---+
|  1|
|  3|
+---+
like image 136
Psidom Avatar answered Sep 27 '22 19:09

Psidom