I have a DataFrame called 'df' like the following:
+-------+-------+-------+
| Atr1 | Atr2 | Atr3 |
+-------+-------+-------+
| A | A | A |
+-------+-------+-------+
| B | A | A |
+-------+-------+-------+
| C | A | A |
+-------+-------+-------+
I want to add a new column to it with incremental values and get the following updated DataFrame:
+-------+-------+-------+-------+
| Atr1 | Atr2 | Atr3 | Atr4 |
+-------+-------+-------+-------+
| A | A | A | 1 |
+-------+-------+-------+-------+
| B | A | A | 2 |
+-------+-------+-------+-------+
| C | A | A | 3 |
+-------+-------+-------+-------+
How could I get it?
If you only need incremental values (like an ID) and if there is no constraint that the numbers need to be consecutive, you could use monotonically_increasing_id()
. The only guarantee when using this function is that the values will be increasing for each row, however, the values themself can differ each execution.
from pyspark.sql.functions import monotonically_increasing_id
df.withColumn("Atr4", monotonically_increasing_id())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With