Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark dataframe get column value into a string variable

I am trying extract column value into a variable so that I can use the value somewhere else in the code. I am trying like the following

 val name= test.filter(test("id").equalTo("200")).select("name").col("name") 

It returns

 name org.apache.spark.sql.Column = name 

how to get the value?

like image 464
G G Avatar asked Jun 10 '16 16:06

G G


People also ask

How do I get columns from a DataFrame Spark?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.

How do I get data from Spark DataFrame?

PySpark Collect() – Retrieve data from DataFrame. Collect() is the function, operation for RDD or Dataframe that is used to retrieve the data from the Dataframe. It is used useful in retrieving all the elements of the row from each partition in an RDD and brings that over the driver node/program.

What does take () do in Spark?

take (num: int) → List[T][source] Take the first num elements of the RDD. It works by first scanning one partition, and use the results from that partition to estimate the number of additional partitions needed to satisfy the limit. Translated from the Scala implementation in RDD#take().


1 Answers

The col("name") gives you a column expression. If you want to extract data from column "name" just do the same thing without col("name"):

val names = test.filter(test("id").equalTo("200"))                 .select("name")                 .collectAsList() // returns a List[Row] 

Then for a row you could get name in String by:

val name = row.getString(0) 
like image 89
Yuan JI Avatar answered Sep 20 '22 14:09

Yuan JI