I'm using Spark in Scala and my aggregated columns are anonymous. Is there a convenient way to rename multiple columns from a dataset? I thought about imposing a schema with <code>as</code> but the key column is a struct (due to the <code>groupBy</code> operation), and I can't find out how to define a <code>case class</code> with a <code>StructType</code> in it. I tried defining a schema as follows: <pre class="prettyprint"><code>val returnSchema = StructType(StructField("edge", StructType(StructField("src", IntegerType, true), StructField("dst", IntegerType), true)), StructField("count", LongType, true)) edge_count.as[returnSchema] </code></pre> but I got a compile error: <pre class="prettyprint"><code>Message: <console>:74: error: overloaded method value apply with alternatives: (fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and> (fields: java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and> (fields: Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType cannot be applied to (org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, Boolean) val returnSchema = StructType(StructField("edge", StructType(StructField("src", IntegerType, true), </code></pre>

The best solution is to name your columns explicitly, e.g., <pre class="prettyprint"><code>df .groupBy('a, 'b) .agg( expr("count(*) as cnt"), expr("sum(x) as x"), expr("sum(y)").as("y") ) </code></pre> If you are using a dataset, you have to provide the type of your columns, e.g., <code>expr("count(*) as cnt").as[Long]</code>. You can use the DSL directly but I often find it to be more verbose than simple SQL expressions. If you want to do mass renames, use a <code>Map</code> and then <code>foldLeft</code> the dataframe.

How to name aggregate columns?

Tags:

scala

apache-spark

apache-spark-dataset

I'm using Spark in Scala and my aggregated columns are anonymous. Is there a convenient way to rename multiple columns from a dataset? I thought about imposing a schema with as but the key column is a struct (due to the groupBy operation), and I can't find out how to define a case class with a StructType in it.

I tried defining a schema as follows:

val returnSchema = StructType(StructField("edge", StructType(StructField("src", IntegerType, true),
                                                             StructField("dst", IntegerType), true)), 
                              StructField("count", LongType, true))
edge_count.as[returnSchema]

but I got a compile error:

Message: <console>:74: error: overloaded method value apply with alternatives:
  (fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and>
  (fields: java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and>
  (fields: Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType
 cannot be applied to (org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, Boolean)
       val returnSchema = StructType(StructField("edge", StructType(StructField("src", IntegerType, true),

861

asked Jul 25 '16 19:07

Emre

1 Answers

The best solution is to name your columns explicitly, e.g.,

df
  .groupBy('a, 'b)
  .agg(
    expr("count(*) as cnt"),
    expr("sum(x) as x"),
    expr("sum(y)").as("y")
  )

If you are using a dataset, you have to provide the type of your columns, e.g., expr("count(*) as cnt").as[Long].

You can use the DSL directly but I often find it to be more verbose than simple SQL expressions.

If you want to do mass renames, use a Map and then foldLeft the dataframe.

answered Sep 17 '22 14:09

Sim

Related questions
                            
                                Cartesian product of two lists
                            
                                Why won't Scala optimize tail call with try/catch?
                            
                                reader writer state monad - how to run this scala code
                            
                                Sequencing an HList
                            
                                meaning of top level private class in scala
                            
                                Can this functionality be implemented with Haskell's type system?
                            
                                Scala updating Array elements
                            
                                How to effectively use Scala in a Spring MVC project?
                            
                                Functional implementation of Tarjan's Strongly Connected Components algorithm
                            
                                How strongly is scala tied to JVM?
                            
                                What to use in the face of deprecation of the scala.util.parsing.json._ package?
                            
                                Synthetic Function "##" in scala
                            
                                Scala overriding a non-abstract def with a var
                            
                                Understanding why "pimp my library" was defined that way in Scala
                            
                                How do I create an enum in scala that has an extra field
                            
                                What is the most mature library for building a Data Analytics Pipeline in Java/Scala for Hadoop?
                            
                                What is the difference between a.ne(null) and a != null in Scala?
                            
                                Scala - Writing Json object to file and reading it
                            
                                How to debug/run a single gatling simulation in IntelliJ IDEA without sbt command?
                            
                                How to create a Spark Dataset from an RDD

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With