Is there a way to reference Spark DataFrame columns by position using an integer?
Analogous Pandas DataFrame operation:
df.iloc[:0] # Give me all the rows at column position 0
The equivalent of Python df.iloc
is collect
PySpark examples:
X = df.collect()[0]['age']
or
X = df.collect()[0][1] #row 0 col 1
Not really, but you can try something like this:
Python:
df = sc.parallelize([(1, "foo", 2.0)]).toDF()
df.select(*df.columns[:1]) # I assume [:1] is what you really want
## DataFrame[_1: bigint]
or
df.select(df.columns[1:3])
## DataFrame[_2: string, _3: double]
Scala
val df = sc.parallelize(Seq((1, "foo", 2.0))).toDF()
df.select(df.columns.slice(0, 1).map(col(_)): _*)
Note:
Spark SQL doesn't support and it is unlikely to ever support row indexing so it is not possible to index across row dimension.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With