I have a data frame in python/pyspark with columns id
time
city
zip
and so on......
Now I added a new column name
to this data frame.
Now I have to arrange the columns in such a way that the name
column comes after id
I have done like below
change_cols = ['id', 'name'] cols = ([col for col in change_cols if col in df] + [col for col in df if col not in change_cols]) df = df[cols]
I am getting this error
pyspark.sql.utils.AnalysisException: u"Reference 'id' is ambiguous, could be: id#609, id#1224.;"
Why is this error occuring. How can I rectify this.
You can replace column values of PySpark DataFrame by using SQL string functions regexp_replace(), translate(), and overlay() with Python examples.
You can use either sort() or orderBy() function of PySpark DataFrame to sort DataFrame by ascending or descending order based on single or multiple columns, you can also do sorting using PySpark SQL sorting functions, In this article, I will explain all these different ways using PySpark examples.
You can change the order of columns in the pandas dataframe using the df. reindex() method.
In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .
You can use select
to change the order of the columns:
df.select("id","name","time","city")
If you're working with a large number of columns:
df.select(sorted(df.columns))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With