I registered a tmp table from a df that has white spaces in the column header.how can i extract the column while using sql query via sqlContext. I tried to use back-tick but it is not working
df1 = sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score as Z_Score` from tmp1 """)
To Remove both leading and trailing space of the column in pyspark we use trim() function. trim() Function takes column name and trims both left and right white space from that column.
trim(), ltrim(), and rtrim() Spark provides functions to eliminate leading and trailing whitespace. The trim() function removes both leading and trailing whitespace as shown in the following example.
You can get the all columns of a Spark DataFrame by using df. columns , it returns an array of column names as Array[Stirng] .
Problem: In Spark or PySpark how to remove white spaces (blanks) in DataFrame string column similar to trim () in SQL that removes left and right white spaces. In Spark & PySpark (Spark with Python) you can remove whitespaces or trim by using pyspark.sql.functions.trim () SQL functions.
There are multiple methods provided by the spark to handle white spaces in data. The most basic way to remove white spaces is to use “regexp_replace”. Unfortunately “regexp_replace” is not always easy to use. So we are going to learn some simple functions like trim, ltrim & rtrim to remove white spaces.
To Remove leading space of the column in pyspark we use ltrim () function. ltrim () Function takes column name and trims the left white space from that column. view source print? df_states = df_states.withColumn ('states_Name', ltrim (df_states.state_name)) so the resultant table with leading space removed will be
Removing spaces from column names in pandas is not very hard we easily remove spaces from column names in pandas using replace () function. We can also replace space with another character. Let’s see the example of both one by one. Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
You have to place only the column name within back-ticks, not its alias:
Without Alias:
df1 = sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1""")
With Alias:
df1 = sqlContext.sql("""select t1.Company, t1.Sector, t1.Industry, t1.`Altman Z-score` as Z_Score from tmp1 t1""")
There is problem in query, Corrected query is below (wrapped as Z_Score in ``) :-
df1 = sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1 """)
One more Alternate:-
import pyspark.sql.functions as F
df1 = sqlContext.sql("""select * from tmp1 """)
df1.select(F.col("Altman Z-score").alias("Z_Score")).show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With