How to handle white spaces in dataframe column names in spark

Tags:

I registered a tmp table from a df that has white spaces in the column header.how can i extract the column while using sql query via sqlContext. I tried to use back-tick but it is not working

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score as Z_Score` from tmp1 """)

430

asked Mar 30 '17 03:03

ben

2 Answers

You have to place only the column name within back-ticks, not its alias:

Without Alias:

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1""")

With Alias:

df1 =  sqlContext.sql("""select t1.Company, t1.Sector, t1.Industry, t1.`Altman Z-score` as Z_Score from tmp1 t1""")

134

answered Oct 10 '22 22:10

himanshuIIITian

There is problem in query, Corrected query is below (wrapped as Z_Score in ``) :-

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1 """)

One more Alternate:-

import pyspark.sql.functions as F
df1 =  sqlContext.sql("""select * from tmp1 """)
df1.select(F.col("Altman Z-score").alias("Z_Score")).show()

answered Oct 11 '22 00:10

Rakesh Kumar

Related questions
                            
                                Pandas cannot read parquet files created in PySpark
                            
                                Clone/Deep-Copy a Spark DataFrame
                            
                                What are the pros and cons of java serialization vs kryo serialization?
                            
                                Serialization Exception on spark
                            
                                Error in accessing cassandra from spark in java: Unable to import CassandraJavaUtil
                            
                                Why does Spark job fails to write output?
                            
                                How to solve SPARK-5063 in nested map functions
                            
                                Apache Spark architecture
                            
                                How to vectorize DataFrame columns for ML algorithms?
                            
                                How to sort RDD
                            
                                How to create a connection to a remote Spark server and read in data from ipython running on local machine?
                            
                                How to read json data using scala from kafka topic in apache spark
                            
                                how to specify consumer group in Kafka Spark Streaming using direct stream
                            
                                How to assign and use column headers in Spark?
                            
                                Spark: difference when read in .gz and .bz2
                            
                                Why python UDF returns unexpected datetime objects where as the same function applied over RDD gives proper datetime object
                            
                                pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: user'
                            
                                Apache Spark reads for S3: can't pickle thread.lock objects
                            
                                How to use double pipe as delimiter in CSV?
                            
                                Is it possible to subclass DataFrame in Pyspark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to handle white spaces in dataframe column names in spark

Tags:

apache-spark

apache-spark-sql

pyspark

ben

People also ask

2 Answers

himanshuIIITian

Rakesh Kumar

Recent Activity

Donate For Us