Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle white spaces in dataframe column names in spark

I registered a tmp table from a df that has white spaces in the column header.how can i extract the column while using sql query via sqlContext. I tried to use back-tick but it is not working

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score as Z_Score` from tmp1 """)
like image 430
ben Avatar asked Mar 30 '17 03:03

ben


People also ask

How do you remove spaces from column names in Pyspark?

To Remove both leading and trailing space of the column in pyspark we use trim() function. trim() Function takes column name and trims both left and right white space from that column.

How do you cut a space in Spark?

trim(), ltrim(), and rtrim() Spark provides functions to eliminate leading and trailing whitespace. The trim() function removes both leading and trailing whitespace as shown in the following example.

How do I list column names in Spark DataFrame?

You can get the all columns of a Spark DataFrame by using df. columns , it returns an array of column names as Array[Stirng] .

How to remove white spaces (blanks) in Dataframe string column?

Problem: In Spark or PySpark how to remove white spaces (blanks) in DataFrame string column similar to trim () in SQL that removes left and right white spaces. In Spark & PySpark (Spark with Python) you can remove whitespaces or trim by using pyspark.sql.functions.trim () SQL functions.

How to remove white spaces in Spark data?

There are multiple methods provided by the spark to handle white spaces in data. The most basic way to remove white spaces is to use “regexp_replace”. Unfortunately “regexp_replace” is not always easy to use. So we are going to learn some simple functions like trim, ltrim & rtrim to remove white spaces.

How to remove leading space of the column in pyspark?

To Remove leading space of the column in pyspark we use ltrim () function. ltrim () Function takes column name and trims the left white space from that column. view source print? df_states = df_states.withColumn ('states_Name', ltrim (df_states.state_name)) so the resultant table with leading space removed will be

How to remove spaces from column names in pandas?

Removing spaces from column names in pandas is not very hard we easily remove spaces from column names in pandas using replace () function. We can also replace space with another character. Let’s see the example of both one by one. Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.


2 Answers

You have to place only the column name within back-ticks, not its alias:

Without Alias:

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1""")

With Alias:

df1 =  sqlContext.sql("""select t1.Company, t1.Sector, t1.Industry, t1.`Altman Z-score` as Z_Score from tmp1 t1""")
like image 134
himanshuIIITian Avatar answered Oct 10 '22 22:10

himanshuIIITian


There is problem in query, Corrected query is below (wrapped as Z_Score in ``) :-

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1 """)

One more Alternate:-

import pyspark.sql.functions as F
df1 =  sqlContext.sql("""select * from tmp1 """)
df1.select(F.col("Altman Z-score").alias("Z_Score")).show()
like image 30
Rakesh Kumar Avatar answered Oct 11 '22 00:10

Rakesh Kumar