I am trying extract column value into a variable so that I can use the value somewhere else in the code. I am trying like the following
val name= test.filter(test("id").equalTo("200")).select("name").col("name")
It returns
name org.apache.spark.sql.Column = name
how to get the value?
You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.
PySpark Collect() – Retrieve data from DataFrame. Collect() is the function, operation for RDD or Dataframe that is used to retrieve the data from the Dataframe. It is used useful in retrieving all the elements of the row from each partition in an RDD and brings that over the driver node/program.
take (num: int) → List[T][source] Take the first num elements of the RDD. It works by first scanning one partition, and use the results from that partition to estimate the number of additional partitions needed to satisfy the limit. Translated from the Scala implementation in RDD#take().
The col("name")
gives you a column expression. If you want to extract data from column "name" just do the same thing without col("name")
:
val names = test.filter(test("id").equalTo("200")) .select("name") .collectAsList() // returns a List[Row]
Then for a row you could get name in String by:
val name = row.getString(0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With