In spark SQL (perhaps only HiveQL) one can do: <pre class="prettyprint"><code>select sex, avg(age) as avg_age from humans group by sex </code></pre> which would result in a <code>DataFrame</code> with columns named <code>"sex"</code> and <code>"avg_age"</code>. How can <code>avg(age)</code> be aliased to <code>"avg_age"</code> without using textual SQL? Edit: After zero323 's answer, I need to add the constraint that: The column-to-be-renamed's name may not be known/guaranteed or even addressable. In textual SQL, using "select EXPR as NAME" removes the requirement to have an intermediate name for EXPR. This is also the case in the example above, where "avg(age)" could get a variety of auto-generated names (which also vary among spark releases and sql-context backends).

Let's suppose <code>human_df</code> is the DataFrame for humans. Since Spark 1.3: <pre class="prettyprint"><code>human_df.groupBy("sex").agg(avg("age").alias("avg_age")) </code></pre>

If you prefer to rename a single column it is possible to use <code>withColumnRenamed</code> method: <pre class="prettyprint"><code>case class Person(name: String, age: Int) val df = sqlContext.createDataFrame( Person("Alice", 2) :: Person("Bob", 5) :: Nil) df.withColumnRenamed("name", "first_name") </code></pre> Alternatively you can use <code>alias</code> method: <pre class="prettyprint"><code>import org.apache.spark.sql.functions.avg df.select(avg($"age").alias("average_age")) </code></pre> You can take it further with small helper: <pre class="prettyprint"><code>import org.apache.spark.sql.Column def normalizeName(c: Column) = { val pattern = "\\W+".r c.alias(pattern.replaceAllIn(c.toString, "_")) } df.select(normalizeName(avg($"age"))) </code></pre>

Is it possible to alias columns programmatically in spark sql?

Tags:

scala

apache-spark

apache-spark-sql

In spark SQL (perhaps only HiveQL) one can do:

select sex, avg(age) as avg_age from humans group by sex

which would result in a DataFrame with columns named "sex" and "avg_age".

How can avg(age) be aliased to "avg_age" without using textual SQL?

Edit: After zero323 's answer, I need to add the constraint that:

The column-to-be-renamed's name may not be known/guaranteed or even addressable. In textual SQL, using "select EXPR as NAME" removes the requirement to have an intermediate name for EXPR. This is also the case in the example above, where "avg(age)" could get a variety of auto-generated names (which also vary among spark releases and sql-context backends).

745

asked Jul 21 '15 12:07

Prikso NAI

2 Answers

Let's suppose human_df is the DataFrame for humans. Since Spark 1.3:

human_df.groupBy("sex").agg(avg("age").alias("avg_age"))

195

answered Sep 24 '22 06:09

Robert Chevallier

If you prefer to rename a single column it is possible to use withColumnRenamed method:

case class Person(name: String, age: Int)  val df = sqlContext.createDataFrame(     Person("Alice", 2) :: Person("Bob", 5) :: Nil)  df.withColumnRenamed("name", "first_name")

Alternatively you can use alias method:

import org.apache.spark.sql.functions.avg  df.select(avg($"age").alias("average_age"))

You can take it further with small helper:

import org.apache.spark.sql.Column  def normalizeName(c: Column) = {   val pattern = "\\W+".r   c.alias(pattern.replaceAllIn(c.toString, "_")) }  df.select(normalizeName(avg($"age")))

answered Sep 23 '22 06:09

zero323

Related questions
                            
                                Resource directory for tests in a Play application
                            
                                How to convert List to ListBuffer?
                            
                                How to compare floating point values in Scala?
                            
                                How to disable test suite in ScalaTest
                            
                                Scala and forward references [duplicate]
                            
                                Play error: value and is not a member of play.api.libs.json
                            
                                Join two ordinary RDDs with/without Spark SQL
                            
                                Can Scala call by reference?
                            
                                Show inferred type in Intellij Scala plugin
                            
                                Is it possible to recover the name of the function from within the function in scala?
                            
                                Left Anti join in Spark?
                            
                                Combining 2 Options into 1
                            
                                Scala Memoization: How does this Scala memo work?
                            
                                Include jar file in Scala interpreter
                            
                                Scala immutable map, when to go mutable?
                            
                                How to get last element of an array in scala
                            
                                Remove whitespaces in string with Scala
                            
                                When to use Option
                            
                                Why does Spark application fail with “ClassNotFoundException: Failed to find data source: kafka” as uber-jar with sbt assembly?
                            
                                Purely functional concurrent skip list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With