I have some problem with the withColumn
function in Spark-Scala environment.
I would like to add a new Column in my DataFrame like that:
+---+----+---+
| A| B| C|
+---+----+---+
| 4|blah| 2|
| 2| | 3|
| 56| foo| 3|
|100|null| 5|
+---+----+---+
became:
+---+----+---+-----+
| A| B| C| D |
+---+----+---+-----+
| 4|blah| 2| 750|
| 2| | 3| 750|
| 56| foo| 3| 750|
|100|null| 5| 750|
+---+----+---+-----+
the column D in one value repeated N-time for each row in my DataFrame.
The code are this:
var totVehicles : Double = df_totVehicles(0).getDouble(0); //return 750
The variable totVehicles returns the correct value, it's works!
The second DataFrame has to calculate 2 fields (id_zipcode, n_vehicles), and add the third column (with the same value -750):
var df_nVehicles =
df_carPark.filter(
substring($"id_time",1,4) < 2013
).groupBy(
$"id_zipcode"
).agg(
sum($"n_vehicles") as 'n_vehicles
).select(
$"id_zipcode" as 'id_zipcode,
'n_vehicles
).orderBy(
'id_zipcode,
'n_vehicles
);
Finally, I add the new column with withColumn
function:
var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))
But Spark returns me this error:
error: value withColumn is not a member of Unit
var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))
Can you help me? Thank you very much!
You can add multiple columns to Spark DataFrame in several ways if you wanted to add a known set of columns you can easily do by chaining withColumn() or on select(). However, sometimes you may need to add multiple columns after applying some transformations n that case you can use either map() or foldLeft().
A new column could be added to an existing Dataset using Dataset. withColumn() method. withColumn accepts two arguments: the column name to be added, and the Column and returns a new Dataset<Row>. The syntax of withColumn() is provided below.
In order to repeat the column in pyspark we will be using repeat() Function. We look at an example on how to repeat the string of the column in pyspark. Repeat the string of the column in pyspark using repeat() function.
In Spark SQL, the withColumn() function is the most popular one, which is used to derive a column from multiple columns, change the current value of a column, convert the datatype of an existing column, create a new column, and many more.
lit
function is for adding literal values as a column
import org.apache.spark.sql.functions._
df.withColumn("D", lit(750))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With