'GroupedData' object has no attribute 'show' when doing doing pivot in spark dataframe

Question

I want to pivot a spark dataframe, I refer pyspark documentation, and based on pivot function, the clue is .groupBy('name').pivot('name', values=None). Here's my dataset,

 In[75]:  spDF.show()
 Out[75]:

+-----------+-----------+
|customer_id|       name|
+-----------+-----------+
|      25620| MCDonnalds|
|      25620|  STARBUCKS|
|      25620|        nan|
|      25620|        nan|
|      25620| MCDonnalds|
|      25620|        nan|
|      25620| MCDonnalds|
|      25620|DUNKINDONUT|
|      25620|   LOTTERIA|
|      25620|        nan|
|      25620| MCDonnalds|
|      25620|DUNKINDONUT|
|      25620|DUNKINDONUT|
|      25620|        nan|
|      25620|        nan|
|      25620|        nan|
|      25620|        nan|
|      25620|   LOTTERIA|
|      25620|   LOTTERIA|
|      25620|  STARBUCKS|
+-----------+-----------+
only showing top 20 rows

And then I try to di pivot the table name

In [96]:
spDF.groupBy('name').pivot('name', values=None)
Out[96]:
<pyspark.sql.group.GroupedData at 0x7f0ad03750f0>

And when I try to show them

In [98]:
spDF.groupBy('name').pivot('name', values=None).show()
Out [98]:

    ---------------------------------------------------------------------------
AttributeError                       Traceback (most recent call last)
<ipython-input-98-94354082e956> in <module>()
----> 1 spDF.groupBy('name').pivot('name', values=None).show()
AttributeError: 'GroupedData' object has no attribute 'show'

I don't know why 'GroupedData' can't be shown, what should I do to solve the issue?

ech0 · Accepted Answer

The pivot() method returns a GroupedData object, just like groupBy(). You cannot use show() on a GroupedData object without using an aggregate function (such as sum() or even count()) on it before.

See this article for more information

Powers · Answer

Let's create some test data that resembles your dataset:

data = [
    ("123", "McDonalds"),
    ("123", "Starbucks"),
    ("123", "McDonalds"),
    ("777", "McDonalds"),
    ("777", "McDonalds"),
    ("777", "Dunkin")
]
df = spark.createDataFrame(data, ["customer_id", "name"])
df.show()

+-----------+---------+
|customer_id|     name|
+-----------+---------+
|        123|McDonalds|
|        123|Starbucks|
|        123|McDonalds|
|        777|McDonalds|
|        777|McDonalds|
|        777|   Dunkin|
+-----------+---------+

Let's pivot the dataset so the customer_ids are columns:

df.groupBy("name").pivot("customer_id").count().show()

+---------+----+----+
|     name| 123| 777|
+---------+----+----+
|McDonalds|   2|   2|
|Starbucks|   1|null|
|   Dunkin|null|   1|
+---------+----+----+

Now let's pivot the DataFrame so the restaurant names are columns:

df.groupBy("customer_id").pivot("name").count().show()

+-----------+------+---------+---------+
|customer_id|Dunkin|McDonalds|Starbucks|
+-----------+------+---------+---------+
|        777|     1|        2|     null|
|        123|  null|        2|        1|
+-----------+------+---------+---------+

Code like df.groupBy("name").show() errors out with the AttributeError: 'GroupedData' object has no attribute 'show' message. You can only call methods defined in the pyspark.sql.GroupedData class on instances of the GroupedData class.

'GroupedData' object has no attribute 'show' when doing doing pivot in spark dataframe

Tags:

python

pandas

dataframe

apache-spark

pyspark

Nabih Bawazir

Video Answer

2 Answers

ech0

Powers

Recent Activity

Donate For Us

'GroupedData' object has no attribute 'show' when doing doing pivot in spark dataframe

Tags:

python

pandas

dataframe

apache-spark

pyspark

Nabih Bawazir

Video Answer

2 Answers

ech0

Powers

Related questions

Recent Activity

Donate For Us