How to lower the case of column names of a data frame but not its values? using RAW Spark SQL and Dataframe methods ?
Input data frame (Imagine I have 100's of these columns in uppercase)
NAME | COUNTRY | SRC | CITY | DEBIT
---------------------------------------------
"foo"| "NZ" | salary | "Auckland" | 15.0
"bar"| "Aus" | investment | "Melbourne"| 12.5
taget dataframe
name | country | src | city | debit
------------------------------------------------
"foo"| "NZ" | salary | "Auckland" | 15.0
"bar"| "Aus" | investment | "Melbourne"| 12.5
We can convert the names into lower case using Pandas' str. lower() function. We first take the column names and convert it to lower case. And then rename the Pandas columns using the lowercase names.
str. lower() and df["x"] = df["x"]. str. lower() .
Java 8
solution to convert the column names to lower case.
import static org.apache.spark.sql.functions.col;
import org.apache.spark.sql.Column;
df.select(Arrays.asList(df.columns()).stream().map(x -> col(x).as(x.toLowerCase())).toArray(size -> new Column[size])).show(false);
If you are using scala, you can simply do the following
import org.apache.spark.sql.functions._
df.select(df.columns.map(x => col(x).as(x.toLowerCase)): _*).show(false)
And if you are using pyspark, you can simply do the following
from pyspark.sql import functions as F
df.select([F.col(x).alias(x.lower()) for x in df.columns]).show()
You can use df.withColumnRenamed(col_name,col_name.lower()) for spark dataframe in python
How about this:
Some fake data:
scala> val df = spark.sql("select 'A' as AA, 'B' as BB")
df: org.apache.spark.sql.DataFrame = [AA: string, BB: string]
scala> df.show()
+---+---+
| AA| BB|
+---+---+
| A| B|
+---+---+
Now re-select all columns with a new name, which is just their lower-case version:
scala> val cols = df.columns.map(c => s"$c as ${c.toLowerCase}")
cols: Array[String] = Array(AA as aa, BB as bb)
scala> val lowerDf = df.selectExpr(cols:_*)
lowerDf: org.apache.spark.sql.DataFrame = [aa: string, bb: string]
scala> lowerDf.show()
+---+---+
| aa| bb|
+---+---+
| A| B|
+---+---+
Note: I use Scala. If you use PySpark and are not familiar with the Scala syntax, then df.columns.map(c => s"$c as ${c.toLowerCase}")
is map(lambda c: c.lower(), df.columns)
in Python and cols:_*
becomes *cols
. Please note I didn't run this translation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With