I read data from a csv file ,but don't have index. I want to add a column from 1 to row's number. What should I do,Thanks (scala)

With Scala you can use: <pre class="prettyprint"><code>import org.apache.spark.sql.functions._ df.withColumn("id",monotonicallyIncreasingId) </code></pre> You can refer to this exemple and scala docs. With Pyspark you can use: <pre class="prettyprint"><code>from pyspark.sql.functions import monotonically_increasing_id df_index = df.select("*").withColumn("id", monotonically_increasing_id()) </code></pre>

monotonically_increasing_id - The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. "I want to add a column from 1 to row's number." Let say we have the following DF <pre class="prettyprint"> +--------+-------------+-------+ | userId | productCode | count | +--------+-------------+-------+ | 25 | 6001 | 2 | | 11 | 5001 | 8 | | 23 | 123 | 5 | +--------+-------------+-------+ </pre> To generate the IDs starting from 1 <pre class="prettyprint"><code>val w = Window.orderBy("count") val result = df.withColumn("index", row_number().over(w)) </code></pre> This would add an index column ordered by increasing value of count. <pre class="prettyprint"> +--------+-------------+-------+-------+ | userId | productCode | count | index | +--------+-------------+-------+-------+ | 25 | 6001 | 2 | 1 | | 23 | 123 | 5 | 2 | | 11 | 5001 | 8 | 3 | +--------+-------------+-------+-------+ </pre>

Spark Dataframe :How to add a index Column : Aka Distributed Data Index

2 Answers

With Scala you can use:

import org.apache.spark.sql.functions._   df.withColumn("id",monotonicallyIncreasingId)

You can refer to this exemple and scala docs.

With Pyspark you can use:

from pyspark.sql.functions import monotonically_increasing_id   df_index = df.select("*").withColumn("id", monotonically_increasing_id())

answered Sep 22 '22 03:09

Omar14

monotonically_increasing_id - The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive.

"I want to add a column from 1 to row's number."

Let say we have the following DF

 +--------+-------------+-------+ | userId | productCode | count | +--------+-------------+-------+ |     25 |        6001 |     2 | |     11 |        5001 |     8 | |     23 |         123 |     5 | +--------+-------------+-------+

To generate the IDs starting from 1

val w = Window.orderBy("count") val result = df.withColumn("index", row_number().over(w))

This would add an index column ordered by increasing value of count.

 +--------+-------------+-------+-------+ | userId | productCode | count | index | +--------+-------------+-------+-------+ |     25 |        6001 |     2 |     1 | |     23 |         123 |     5 |     2 | |     11 |        5001 |     8 |     3 | +--------+-------------+-------+-------+

answered Sep 22 '22 03:09

anshu kumar

Related questions
                            
                                Computing the MD5 hash of a string in scala [duplicate]
                            
                                Scala case class extending Product with Serializable
                            
                                Explanation of singleton objects in Scala
                            
                                Scala char to int conversion
                            
                                What's the new way to iterate over a Java Map in Scala 2.8.0?
                            
                                Why are singleton objects more object-oriented?
                            
                                define your own exceptions with overloaded constructors in scala
                            
                                Scala Some v. Option
                            
                                What exactly makes Option a monad in Scala?
                            
                                What are some examples of type-level programming? [closed]
                            
                                How to turn json to case class when case class has only one field
                            
                                akka HttpResponse read body as String scala
                            
                                How to run external jar functions in spark-shell
                            
                                Mixing object-oriented and functional programming
                            
                                How to count occurrences of each distinct value for every column in a dataframe?
                            
                                Filter Spark DataFrame by checking if value is in a list, with other criteria
                            
                                Create new Dataframe with empty/null field values
                            
                                Scala: How can I replace value in Dataframes using scala
                            
                                Choosing the last element of a list
                            
                                Functions without arguments, with unit as argument in scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark Dataframe :How to add a index Column : Aka Distributed Data Index

Tags:

dataframe

scala

apache-spark

apache-spark-sql

Liangpi

People also ask

2 Answers

Omar14

anshu kumar

Recent Activity

Donate For Us