I have a val ds: Dataset[Double]
(in Spark 2.0.0), but what is the name of the double-valued column that can be passed to apply
or col
to convert from this 1-columned Dataset
to a Column
.
Another way of seeing or getting the names of the column present in the dataframe we can see the Schema of the Dataframe, this can be done by the function printSchema() this function is used to print the schema of the Dataframe from that scheme we can see all the column names.
You can get the all columns of a Spark DataFrame by using df. columns , it returns an array of column names as Array[Stirng] .
In order to convert Spark DataFrame Column to List, first select() the column you want, next use the Spark map() transformation to convert the Row to String, finally collect() the data to the driver which returns an Array[String] .
To select a column from the Dataset, use apply method in Scala and col in Java. Note that the Column type can also be manipulated through its various functions. and in Java: // To create Dataset<Row> using SparkSession Dataset<Row> people = spark.
The column name is "value" as in ds.col("value")
. Dataset.schema
contains this information: ds.schema.fields.foreach(x => println(x))
You could also use DataFrame
's method columns
, which returns all columns as an Array of Strings.
case class Person(age: Int, height: Int, weight: Int){
def sum = age + height + weight
}
val df = sc.parallelize(List(Person(1,2,3), Person(4,5,6))).toDF("age", "height", "weight")
df.columns
//res0: Array[String] = Array(age, height, weight)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With