Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix 'DataFrame' object has no attribute 'coalesce'?

In a PySpark application, I tried to transpose a dataframe by transforming it into pandas and then I want to write the result in csv file. This is how I am doing it:

df = df.toPandas().set_index("s").transpose()
df.coalesce(1).write.option("header", True).option("delimiter", ",").csv('dataframe')

When execution this script I get the following error:

'DataFrame' object has no attribute 'coalesce'

What is the problem? How can I fix it?

like image 658
Mehdi Ben Hamida Avatar asked Oct 30 '22 01:10

Mehdi Ben Hamida


1 Answers

The problem is that you converted the spark dataframe into a pandas dataframe. A pandas dataframe do not have a coalesce method. You can see the documentation for pandas here.

When you use toPandas() the dataframe is already collected and in memory, try to use the pandas dataframe method df.to_csv(path) instead.

like image 51
Shaido Avatar answered Nov 15 '22 07:11

Shaido