Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to query the column names of a Spark Dataset?

I have a val ds: Dataset[Double] (in Spark 2.0.0), but what is the name of the double-valued column that can be passed to apply or col to convert from this 1-columned Dataset to a Column.

like image 708
fred271828 Avatar asked Sep 19 '16 17:09

fred271828


People also ask

How do I get column names from Spark DataFrame?

Another way of seeing or getting the names of the column present in the dataframe we can see the Schema of the Dataframe, this can be done by the function printSchema() this function is used to print the schema of the Dataframe from that scheme we can see all the column names.

How do I get all column names in Spark?

You can get the all columns of a Spark DataFrame by using df. columns , it returns an array of column names as Array[Stirng] .

How do I get columns from Spark DataFrame?

In order to convert Spark DataFrame Column to List, first select() the column you want, next use the Spark map() transformation to convert the Row to String, finally collect() the data to the driver which returns an Array[String] .

How do I select columns in Spark Dataset?

To select a column from the Dataset, use apply method in Scala and col in Java. Note that the Column type can also be manipulated through its various functions. and in Java: // To create Dataset<Row> using SparkSession Dataset<Row> people = spark.


2 Answers

The column name is "value" as in ds.col("value"). Dataset.schema contains this information: ds.schema.fields.foreach(x => println(x))

like image 93
fred271828 Avatar answered Oct 13 '22 23:10

fred271828


You could also use DataFrame's method columns, which returns all columns as an Array of Strings.

case class Person(age: Int, height: Int, weight: Int){
  def sum = age + height + weight
}

val df = sc.parallelize(List(Person(1,2,3), Person(4,5,6))).toDF("age", "height", "weight")

df.columns
//res0: Array[String] = Array(age, height, weight)
like image 41
Alberto Bonsanto Avatar answered Oct 14 '22 00:10

Alberto Bonsanto