How to lower the case of column names of a data frame but not its values? using RAW Spark SQL and Dataframe methods ? Input data frame (Imagine I have 100's of these columns in uppercase) <pre class="prettyprint"><code>NAME | COUNTRY | SRC | CITY | DEBIT --------------------------------------------- "foo"| "NZ" | salary | "Auckland" | 15.0 "bar"| "Aus" | investment | "Melbourne"| 12.5 </code></pre> taget dataframe <pre class="prettyprint"><code>name | country | src | city | debit ------------------------------------------------ "foo"| "NZ" | salary | "Auckland" | 15.0 "bar"| "Aus" | investment | "Melbourne"| 12.5 </code></pre>

If you are using scala, you can simply do the following <pre class="prettyprint"><code>import org.apache.spark.sql.functions._ df.select(df.columns.map(x => col(x).as(x.toLowerCase)): _*).show(false) </code></pre> And if you are using pyspark, you can simply do the following <pre class="prettyprint"><code>from pyspark.sql import functions as F df.select([F.col(x).alias(x.lower()) for x in df.columns]).show() </code></pre>

How about this: Some fake data: <pre class="prettyprint"><code>scala> val df = spark.sql("select 'A' as AA, 'B' as BB") df: org.apache.spark.sql.DataFrame = [AA: string, BB: string] scala> df.show() +---+---+ | AA| BB| +---+---+ | A| B| +---+---+ </code></pre> Now re-select all columns with a new name, which is just their lower-case version: <pre class="prettyprint"><code>scala> val cols = df.columns.map(c => s"$c as ${c.toLowerCase}") cols: Array[String] = Array(AA as aa, BB as bb) scala> val lowerDf = df.selectExpr(cols:_*) lowerDf: org.apache.spark.sql.DataFrame = [aa: string, bb: string] scala> lowerDf.show() +---+---+ | aa| bb| +---+---+ | A| B| +---+---+ </code></pre> Note: I use Scala. If you use PySpark and are not familiar with the Scala syntax, then <code>df.columns.map(c => s"$c as ${c.toLowerCase}")</code> is <code>map(lambda c: c.lower(), df.columns)</code> in Python and <code>cols:_*</code> becomes <code>*cols</code>. Please note I didn't run this translation.

How to lower the case of column names of a data frame but not its values?

Tags:

apache-spark

apache-spark-sql

apache-spark-dataset

How to lower the case of column names of a data frame but not its values? using RAW Spark SQL and Dataframe methods ?

Input data frame (Imagine I have 100's of these columns in uppercase)

NAME | COUNTRY | SRC        | CITY       | DEBIT
---------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

taget dataframe

name | country | src        | city       | debit
------------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

609

asked Feb 07 '18 23:02

user1870400

4 Answers

Java 8 solution to convert the column names to lower case.

import static org.apache.spark.sql.functions.col;
import org.apache.spark.sql.Column;

df.select(Arrays.asList(df.columns()).stream().map(x -> col(x).as(x.toLowerCase())).toArray(size -> new Column[size])).show(false);

167

answered Oct 14 '22 22:10

abaghel

If you are using scala, you can simply do the following

import org.apache.spark.sql.functions._
df.select(df.columns.map(x => col(x).as(x.toLowerCase)): _*).show(false)

And if you are using pyspark, you can simply do the following

from pyspark.sql import functions as F
df.select([F.col(x).alias(x.lower()) for x in df.columns]).show()

answered Oct 14 '22 20:10

Ramesh Maharjan

You can use df.withColumnRenamed(col_name,col_name.lower()) for spark dataframe in python

answered Oct 14 '22 20:10

Harshit Mehta

How about this:

Some fake data:

scala> val df = spark.sql("select 'A' as AA, 'B' as BB")
df: org.apache.spark.sql.DataFrame = [AA: string, BB: string]

scala> df.show()
+---+---+
| AA| BB|
+---+---+
|  A|  B|
+---+---+

Now re-select all columns with a new name, which is just their lower-case version:

scala> val cols = df.columns.map(c => s"$c as ${c.toLowerCase}")
cols: Array[String] = Array(AA as aa, BB as bb)

scala> val lowerDf = df.selectExpr(cols:_*)
lowerDf: org.apache.spark.sql.DataFrame = [aa: string, bb: string]

scala> lowerDf.show()
+---+---+
| aa| bb|
+---+---+
|  A|  B|
+---+---+

Note: I use Scala. If you use PySpark and are not familiar with the Scala syntax, then df.columns.map(c => s"$c as ${c.toLowerCase}") is map(lambda c: c.lower(), df.columns) in Python and cols:_* becomes *cols. Please note I didn't run this translation.

answered Oct 14 '22 22:10

shakedzy

Related questions
                            
                                Spark DataFrame Repartition and Parquet Partition
                            
                                How to use spark to generate huge amount of random integers?
                            
                                How to remove parentheses around records when saveAsTextFile on RDD[(String, Int)]?
                            
                                How to read whole file in one string
                            
                                Spark Multiclass Classification Example
                            
                                Apache Spark upgrade from 1.5.2 to 1.6.0 using homebrew leading to permission denied error during execution
                            
                                Multiple SparkContext detected in the same JVM
                            
                                How can I sum multiple columns in a spark dataframe in pyspark?
                            
                                How to set column names to toDF() function in spark dataframe using a string array?
                            
                                Creating a row number of each row in PySpark DataFrame using row_number() function with Spark version 2.2
                            
                                What is the Scala type mapping for all Spark SQL DataType
                            
                                Spark job in Java: how to access files from 'resources' when run on a cluster
                            
                                How to copy and convert parquet files to csv
                            
                                Create array of literals and columns from List of Strings in Spark SQL
                            
                                How to convert Row to json in Spark 2 Scala
                            
                                Compare in-memory cluster computing systems
                            
                                In Spark Dataframe how to get duplicate records and distinct records in two dataframes?
                            
                                Find out the partition no/id
                            
                                Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers
                            
                                How can I create a Spark DataFrame from a nested array of struct element?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With