I have a dataframe, and want to rename it using toDF by passing the columns names from list, here column list is dynamic, when i do as below getting error, how can i achieve this?
>>> df.printSchema()
root
|-- id: long (nullable = true)
|-- name: string (nullable = true)
|-- dept: string (nullable = true)
columns = ['NAME_FIRST', 'DEPT_NAME']
df2 = df.toDF('ID', 'NAME_FIRST', 'DEPT_NAME')
(or)
df2 = df.toDF('id', columns[0], columns[1])
this, does not work if we dont know how many columns would be there in the input data frame, so want to pass the list to df2, i tried as below
df2 = df.toDF('id', columns)
pyspark.sql.utils.IllegalArgumentException: u"requirement failed: The number of columns doesn't match.\nOld column names (3): id, name, dept\nNew column names (2): id, name_first, dept_name"
Here it treats list as single item, how to pass the columns from list?
df2 = df.toDF(columns) does not work, add a * like below -
columns = ['NAME_FIRST', 'DEPT_NAME']
df2 = df.toDF(*columns)
"*" is the "splat" operator: It takes a list as input, and expands it into actual positional arguments in the function call
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With