How do I remove the "." from a Spark DataFrame column name?
The DataFrame.select(F.col().alias())
method to rename column names that have a "." in them throws an error.
The following code is reproducible.
# import Spark libraries, configuration, Contexts, and types.
import pyspark
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
#############
# Start Spark.
spark = SparkSession.builder.appName("test").getOrCreate()
testdf = spark.createDataFrame([
(1, "Julie", "CEO"),
(2, "Janice", "CFO"),
(3, "Jake", "CTO")],
["ID", "First Name", "Title Initial."])
# this works just fine.
testdf.select(F.col('First Name').alias('first_name')).show(3)
# This throws an error.
testdf.select(F.col('Title Initial.').alias('title')).show(3)
Error:
AnalysisException: u'syntax error in attribute name: Title Initial.;'
What is an alternative method to change DataFrame column names that have a "." in them?
Surround the column name with `
testdf.select(F.col('`Title Initial.`').alias('title')).show(3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With