I want to add a new column to a Dataframe, a UUID generator.
UUID value will look something like 21534cf7-cff9-482a-a3a8-9e7244240da7
My Research:
I've tried with withColumn
method in spark.
val DF2 = DF1.withColumn("newcolname", DF1("existingcolname" + 1)
So DF2 will have additional column with newcolname
with 1 added to it in all rows.
By my requirement is that I want to have a new column which can generate the UUID.
A UUID (Universal Unique Identifier) is a 128-bit value used to uniquely identify an object or entity on the internet. Depending on the specific mechanisms used, a UUID is either guaranteed to be different or is, at least, extremely likely to be different from any other UUID generated until A.D. 3400.
In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .
You can use the assign() function to add a new column to the end of a pandas DataFrame: df = df. assign(col_name=[value1, value2, value3, ...])
You can utilize built-in Spark SQL uuid function:
.withColumn("uuid", expr("uuid()"))
A full example in Scala:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
object CreateDf extends App {
val spark = SparkSession.builder
.master("local[*]")
.appName("spark_local")
.getOrCreate()
import spark.implicits._
Seq(1, 2, 3).toDF("col1")
.withColumn("uuid", expr("uuid()"))
.show(false)
}
Output:
+----+------------------------------------+
|col1|uuid |
+----+------------------------------------+
|1 |24181c68-51b7-42ea-a9fd-f88dcfa10062|
|2 |7cd21b25-017e-4567-bdd3-f33b001ee497|
|3 |1df7cfa8-af8a-4421-834f-5359dc3ae417|
+----+------------------------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With