I have a data frame with three columns and I am trying to do a line plot using Seaborn library but it throws me an error saying that 'DataFrame' object has no attribute 'get'
. Here is my test data frame
Age variable value
31 Overall 69.76751118
31 Potential 69.76751118
31 Growth 0
34 Overall 68.91176471
34 Potential 68.91176471
34 Growth 0
28 Overall 69.05803996
28 Potential 69.05803996
28 Growth 0.24643197
This is what I am trying to do using the seaborn line plot after reading in the csv file
test = spark.read.csv("test.csv", inferSchema=True, header=True)
sns.lineplot(x = "Age", y = "value", hue = "variable", data = test)
And the error that I get is this
AttributeError: 'DataFrame' object has no attribute 'get'
However when I convert the data frame to Pandas data frame and use exactly the same seaborn code it works
test_df = test.toPandas()
sns.lineplot(x = "Age", y = "value", hue = "variable", data = test_df)
Am I doing anything wrong with Spark Data frames.
Several open source tools exist to aid visualization in Python such as matplotlib, Seaborn, Bokeh etc. However, none of these visualization tools can be used directly with PySpark's DataFrames.
A spark dataframe and a pandas dataframe, despite sharing a lot of the same functionalities, differ on where and how they allocate data.
This step is correct:
test_df = test.toPandas()
You will always need to collect the data before you can use it to plot with seaborn (or even matplotlib)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With