Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to lower the case of column names of a data frame but not its values?

How to lower the case of column names of a data frame but not its values? using RAW Spark SQL and Dataframe methods ?

Input data frame (Imagine I have 100's of these columns in uppercase)

NAME | COUNTRY | SRC        | CITY       | DEBIT
---------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

taget dataframe

name | country | src        | city       | debit
------------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5
like image 609
user1870400 Avatar asked Feb 07 '18 23:02

user1870400


People also ask

How do I change a column name to lowercase in Dataframe?

We can convert the names into lower case using Pandas' str. lower() function. We first take the column names and convert it to lower case. And then rename the Pandas columns using the lowercase names.

How do you lowercase a column in Python?

str. lower() and df["x"] = df["x"]. str. lower() .


4 Answers

Java 8 solution to convert the column names to lower case.

import static org.apache.spark.sql.functions.col;
import org.apache.spark.sql.Column;

df.select(Arrays.asList(df.columns()).stream().map(x -> col(x).as(x.toLowerCase())).toArray(size -> new Column[size])).show(false);
like image 167
abaghel Avatar answered Oct 14 '22 22:10

abaghel


If you are using scala, you can simply do the following

import org.apache.spark.sql.functions._
df.select(df.columns.map(x => col(x).as(x.toLowerCase)): _*).show(false)

And if you are using pyspark, you can simply do the following

from pyspark.sql import functions as F
df.select([F.col(x).alias(x.lower()) for x in df.columns]).show()
like image 43
Ramesh Maharjan Avatar answered Oct 14 '22 20:10

Ramesh Maharjan


You can use df.withColumnRenamed(col_name,col_name.lower()) for spark dataframe in python

like image 31
Harshit Mehta Avatar answered Oct 14 '22 20:10

Harshit Mehta


How about this:

Some fake data:

scala> val df = spark.sql("select 'A' as AA, 'B' as BB")
df: org.apache.spark.sql.DataFrame = [AA: string, BB: string]

scala> df.show()
+---+---+
| AA| BB|
+---+---+
|  A|  B|
+---+---+

Now re-select all columns with a new name, which is just their lower-case version:

scala> val cols = df.columns.map(c => s"$c as ${c.toLowerCase}")
cols: Array[String] = Array(AA as aa, BB as bb)

scala> val lowerDf = df.selectExpr(cols:_*)
lowerDf: org.apache.spark.sql.DataFrame = [aa: string, bb: string]

scala> lowerDf.show()
+---+---+
| aa| bb|
+---+---+
|  A|  B|
+---+---+

Note: I use Scala. If you use PySpark and are not familiar with the Scala syntax, then df.columns.map(c => s"$c as ${c.toLowerCase}") is map(lambda c: c.lower(), df.columns) in Python and cols:_* becomes *cols. Please note I didn't run this translation.

like image 29
shakedzy Avatar answered Oct 14 '22 22:10

shakedzy