I have some problem with the <code>withColumn</code> function in Spark-Scala environment. I would like to add a new Column in my DataFrame like that: <pre class="prettyprint"><code>+---+----+---+ | A| B| C| +---+----+---+ | 4|blah| 2| | 2| | 3| | 56| foo| 3| |100|null| 5| +---+----+---+ </code></pre> became: <pre class="prettyprint"><code>+---+----+---+-----+ | A| B| C| D | +---+----+---+-----+ | 4|blah| 2| 750| | 2| | 3| 750| | 56| foo| 3| 750| |100|null| 5| 750| +---+----+---+-----+ </code></pre> the column D in one value repeated N-time for each row in my DataFrame. The code are this: <pre class="prettyprint"><code>var totVehicles : Double = df_totVehicles(0).getDouble(0); //return 750 </code></pre> The variable totVehicles returns the correct value, it's works! The second DataFrame has to calculate 2 fields (id_zipcode, n_vehicles), and add the third column (with the same value -750): <pre class="prettyprint"><code>var df_nVehicles = df_carPark.filter( substring($"id_time",1,4) < 2013 ).groupBy( $"id_zipcode" ).agg( sum($"n_vehicles") as 'n_vehicles ).select( $"id_zipcode" as 'id_zipcode, 'n_vehicles ).orderBy( 'id_zipcode, 'n_vehicles ); </code></pre> Finally, I add the new column with <code>withColumn</code> function: <pre class="prettyprint"><code>var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode")) </code></pre> But Spark returns me this error: <pre class="prettyprint"><code> error: value withColumn is not a member of Unit var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode")) </code></pre> Can you help me? Thank you very much!

<code>lit</code> function is for adding literal values as a column <pre class="prettyprint"><code>import org.apache.spark.sql.functions._ df.withColumn("D", lit(750)) </code></pre>

Spark, add new Column with the same value in Scala [duplicate]

Tags:

scala

apache-spark

spark-dataframe

I have some problem with the withColumn function in Spark-Scala environment. I would like to add a new Column in my DataFrame like that:

+---+----+---+
|  A|   B|  C|
+---+----+---+
|  4|blah|  2|
|  2|    |  3|
| 56| foo|  3|
|100|null|  5|
+---+----+---+

became:

+---+----+---+-----+
|  A|   B|  C|  D  |
+---+----+---+-----+
|  4|blah|  2|  750|
|  2|    |  3|  750|
| 56| foo|  3|  750|
|100|null|  5|  750|
+---+----+---+-----+

the column D in one value repeated N-time for each row in my DataFrame.

The code are this:

var totVehicles : Double = df_totVehicles(0).getDouble(0); //return 750

The variable totVehicles returns the correct value, it's works!

The second DataFrame has to calculate 2 fields (id_zipcode, n_vehicles), and add the third column (with the same value -750):

var df_nVehicles =
df_carPark.filter(
      substring($"id_time",1,4) < 2013
    ).groupBy(
      $"id_zipcode"
    ).agg(
      sum($"n_vehicles") as 'n_vehicles
    ).select(
      $"id_zipcode" as 'id_zipcode,
      'n_vehicles
    ).orderBy(
      'id_zipcode,
      'n_vehicles
    );

Finally, I add the new column with withColumn function:

var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))

But Spark returns me this error:

 error: value withColumn is not a member of Unit
         var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))

Can you help me? Thank you very much!

976

asked Jul 26 '16 10:07

Alessandro

1 Answers

lit function is for adding literal values as a column

import org.apache.spark.sql.functions._
df.withColumn("D", lit(750))

answered Sep 28 '22 20:09

Rockie Yang

Related questions
                            
                                scalacheck case class random data generator
                            
                                Replace null values in Spark DataFrame
                            
                                Getting the value of a DataFrame column in Spark
                            
                                Scala REPL no echo on input
                            
                                Does Scala have a library method to wrap nullable return values in an Option?
                            
                                initialise a var in scala
                            
                                Scala - convert Array[String] to Array[Double]
                            
                                Apache spark error: not found: value sqlContext
                            
                                Spark Shell "Failed to Initialize Compiler" Error on a mac
                            
                                Scala equivalent to Haskell's where-clauses?
                            
                                What very large functional language projects are freely available? [closed]
                            
                                scala: accumulate a var from collection in a functional manner (that is, no vars)
                            
                                What makes recent versions of JVM faster?
                            
                                Filtering a list of tuples
                            
                                How to randomly sample from a Scala list or array?
                            
                                How to set thread number for the parallel collections?
                            
                                How are akka actors implemented on underlying threads?
                            
                                Serializing a Scala List to JSON in Play2
                            
                                IntelliJ: Scala worksheet don't pick up code changes without restart
                            
                                Side effects in Scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With