how to add a Incremental column ID for a table in spark SQL

Tags:

I'm working on a spark mllib algorithm. The dataset I have is in this form

Company":"XXXX","CurrentTitle":"XYZ","Edu_Title":"ABC","Exp_mnth":.(there are more values similar to these)

Im trying to raw code String values to Numeric values. So, I tried using zipwithuniqueID for unique value for each of the string values.For some reason I'm not able to save the modified dataset to the disk. Can I do this in any way using spark SQL? or what would be the better approach for this?

220

asked Jul 14 '16 14:07

KM-Yash

1 Answers

Scala

val dataFrame1 = dataFrame0.withColumn("index",monotonically_increasing_id())

Java

 Import org.apache.spark.sql.functions;
Dataset<Row> dataFrame1 = dataFrame0.withColumn("index",functions.monotonically_increasing_id());

147

answered Oct 14 '22 14:10

Yugerten

Related questions
                            
                                Reading multiple files from S3 in parallel (Spark, Java)
                            
                                How to convert RDD of dense vector into DataFrame in pyspark?
                            
                                ClassNotFoundException scala.runtime.LambdaDeserialize when spark-submit
                            
                                overwrite hive partitions using spark
                            
                                Spark cluster fails on bigger input, works well for small
                            
                                How to use Hadoop InputFormats In Apache Spark?
                            
                                Spark multiple contexts
                            
                                How to create a custom Transformer from a UDF?
                            
                                Can not infer schema for type: <type 'str'>
                            
                                How do I run a local Spark 2.x Session?
                            
                                Split Spark DataFrame based on condition
                            
                                Apache Storm vs Apache Samza vs Apache Spark [closed]
                            
                                In what scenarios hash partitioning is preferred over range partitioning in Spark?
                            
                                How to login SSH on Azure Databricks cluster
                            
                                What is the relationship between tasks and partitions?
                            
                                How to read ".gz" compressed file using spark DF or DS?
                            
                                How to fix the Error: "org.jetbrains.jps.incremental.scala.remote.ServerException java.lang.StackOverflowError"
                            
                                Filter RDD based on row_number
                            
                                Pyspark import .py file not working
                            
                                Attach metadata to vector column in Spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to add a Incremental column ID for a table in spark SQL

Tags:

apache-spark

apache-spark-sql

spark-dataframe

apache-spark-mllib

KM-Yash

People also ask

1 Answers

Yugerten

Recent Activity

Donate For Us