In spark SQL (perhaps only HiveQL) one can do:
select sex, avg(age) as avg_age from humans group by sex
which would result in a DataFrame
with columns named "sex"
and "avg_age"
.
How can avg(age)
be aliased to "avg_age"
without using textual SQL?
Edit: After zero323 's answer, I need to add the constraint that:
The column-to-be-renamed's name may not be known/guaranteed or even addressable. In textual SQL, using "select EXPR as NAME" removes the requirement to have an intermediate name for EXPR. This is also the case in the example above, where "avg(age)" could get a variety of auto-generated names (which also vary among spark releases and sql-context backends).
To create an alias of a column, we will use the . alias() method. This method is SQL equivalent of the 'AS' keyword which is used to create aliases. It gives a temporary name to our column of the output PySpark DataFrame.
The basic syntax of a table alias is as follows. SELECT column1, column2.... FROM table_name AS alias_name WHERE [condition]; The basic syntax of a column alias is as follows.
Alias of PySpark DataFrame column changes the name of the column without changing the type and the data.
Let's suppose human_df
is the DataFrame for humans. Since Spark 1.3:
human_df.groupBy("sex").agg(avg("age").alias("avg_age"))
If you prefer to rename a single column it is possible to use withColumnRenamed
method:
case class Person(name: String, age: Int) val df = sqlContext.createDataFrame( Person("Alice", 2) :: Person("Bob", 5) :: Nil) df.withColumnRenamed("name", "first_name")
Alternatively you can use alias
method:
import org.apache.spark.sql.functions.avg df.select(avg($"age").alias("average_age"))
You can take it further with small helper:
import org.apache.spark.sql.Column def normalizeName(c: Column) = { val pattern = "\\W+".r c.alias(pattern.replaceAllIn(c.toString, "_")) } df.select(normalizeName(avg($"age")))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With