I have a pyspark data frame like:
+--------+-------+-------+
| col1 | col2 | col3 |
+--------+-------+-------+
| 25 | 01 | 2 |
| 23 | 12 | 5 |
| 11 | 22 | 8 |
+--------+-------+-------+
and I want to create new dataframe by adding a new column like this:
+--------------+-------+-------+-------+
| new_column | col1 | col2 | col3 |
+--------------+-------+-------+-------+
| 0 | 01 | 2 | 0 |
| 0 | 12 | 5 | 0 |
| 0 | 22 | 8 | 0 |
+--------------+-------+-------+-------+
I know I can add column by:
df.withColumn("new_column", lit(0))
but it adds column at last like this:
+--------------+-------+-------+-------------+
| col1 | col1 | col2 | new_column |
+--------------+-------+-------+-------------+
| 25 | 01 | 2 | 0 |
| 23 | 12 | 5 | 0 |
| 11 | 22 | 8 | 0 |
+--------------+-------+-------+-------------+
you can reorder columns using select.
df = df.select('new_column','col1','col2','col3')
df.show()
You can always reorder the columns in a spark DataFrame using select
, as shown in this post.
In this case, you can also achieve the desired output in one step using select
and alias
as follows:
df = df.select(lit(0).alias("new_column"), "*")
Which is logically equivalent to the following SQL code:
SELECT 0 AS new_column, * FROM df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With