I have a pySpark dataframe in python as -
from pyspark.sql.functions import col
dataset = sqlContext.range(0, 100).select((col("id") % 3).alias("key"))
the column name is key and I would like to select this column using a variable.
myvar = "key"
now I want to select this column using the myvar
variable in perhaps a select statement
I tried this
dataset.createOrReplaceTempView("dataset")
spark.sql(" select $myvar from dataset ").show
but it returns me an error
no viable alternative at input 'select $'(line 1, pos 8)
How do I achieve this in pySpark?
Note that I may have different columns in future and I want to pass more than 1 variables or perhaps a list
into SELECT
clause.
dataset.select(myVar)
will select a single column based on variable
.select
can also take a list dataset.select([myVar, mySecondVar])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With