Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark, add new Column with the same value in Scala [duplicate]

I have some problem with the withColumn function in Spark-Scala environment. I would like to add a new Column in my DataFrame like that:

+---+----+---+
|  A|   B|  C|
+---+----+---+
|  4|blah|  2|
|  2|    |  3|
| 56| foo|  3|
|100|null|  5|
+---+----+---+

became:

+---+----+---+-----+
|  A|   B|  C|  D  |
+---+----+---+-----+
|  4|blah|  2|  750|
|  2|    |  3|  750|
| 56| foo|  3|  750|
|100|null|  5|  750|
+---+----+---+-----+

the column D in one value repeated N-time for each row in my DataFrame.

The code are this:

var totVehicles : Double = df_totVehicles(0).getDouble(0); //return 750

The variable totVehicles returns the correct value, it's works!

The second DataFrame has to calculate 2 fields (id_zipcode, n_vehicles), and add the third column (with the same value -750):

var df_nVehicles =
df_carPark.filter(
      substring($"id_time",1,4) < 2013
    ).groupBy(
      $"id_zipcode"
    ).agg(
      sum($"n_vehicles") as 'n_vehicles
    ).select(
      $"id_zipcode" as 'id_zipcode,
      'n_vehicles
    ).orderBy(
      'id_zipcode,
      'n_vehicles
    );

Finally, I add the new column with withColumn function:

var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))

But Spark returns me this error:

 error: value withColumn is not a member of Unit
         var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))

Can you help me? Thank you very much!

like image 976
Alessandro Avatar asked Jul 26 '16 10:07

Alessandro


People also ask

How do I add a column to a DataFrame in Spark Scala?

You can add multiple columns to Spark DataFrame in several ways if you wanted to add a known set of columns you can easily do by chaining withColumn() or on select(). However, sometimes you may need to add multiple columns after applying some transformations n that case you can use either map() or foldLeft().

How do I add a column in Spark dataset?

A new column could be added to an existing Dataset using Dataset. withColumn() method. withColumn accepts two arguments: the column name to be added, and the Column and returns a new Dataset<Row>. The syntax of withColumn() is provided below.

How do you replicate a column in Pyspark?

In order to repeat the column in pyspark we will be using repeat() Function. We look at an example on how to repeat the string of the column in pyspark. Repeat the string of the column in pyspark using repeat() function.

What does withColumn do in Spark?

In Spark SQL, the withColumn() function is the most popular one, which is used to derive a column from multiple columns, change the current value of a column, convert the datatype of an existing column, create a new column, and many more.


1 Answers

lit function is for adding literal values as a column

import org.apache.spark.sql.functions._
df.withColumn("D", lit(750))
like image 82
Rockie Yang Avatar answered Sep 28 '22 20:09

Rockie Yang