I am defining a column object like this:
column = F.col('foo').alias('bar')
I know I can get the full expression using str(column)
.
But how can I get the column's alias only?
In the example, I'm looking for a function get_column_name
where get_column_name(column)
returns the string bar
.
One way is through regular expressions:
from pyspark.sql.functions import col
column = col('foo').alias('bar')
print(column)
#Column<foo AS `bar`>
import re
print(re.findall("(?<=AS `)\w+(?=`>$)", str(column)))[0]
#'bar'
Alternatively, we could use a wrapper function to tweak the behavior of Column.alias
and Column.name
methods to store the alias only in an AS
attribute:
from pyspark.sql import Column, SparkSession
from pyspark.sql.functions import col, explode, array, struct, lit
SparkSession.builder.getOrCreate()
def alias_wrapper(self, *alias, **kwargs):
renamed_col = Column._alias(self, *alias, **kwargs)
renamed_col.AS = alias[0] if len(alias) == 1 else alias
return renamed_col
Column._alias, Column.alias, Column.name, Column.AS = Column.alias, alias_wrapper, alias_wrapper, None
which then guarantees:
assert(col("foo").alias("bar").AS == "bar")
# `name` should act like `alias`
assert(col("foo").name("bar").AS == "bar")
# column without alias should have None in `AS`
assert(col("foo").AS is None)
# multialias should be handled
assert(explode(array(struct(lit(1), lit("a")))).alias("foo", "bar").AS == ("foo", "bar"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With