I have the following dataframe.
In [12]: dfFinal
Out[12]: 
           module                                            vectime                                           vecvalue
1906  client1.tcp  [1.1007512, 1.1015024, 1.1022536, 1.1030048, 1...  [0.0007512, 0.0007512, 0.0007512, 0.0007512, 0...
1912  client2.tcp  [1.10079784, 1.10159568, 1.10239352, 1.1031913...  [0.00079784, 0.00079784, 0.00079784, 0.0007978...
1918  client3.tcp  [1.10084448, 1.10168896, 1.10258008, 1.1036111...  [0.00084448, 0.00084448, 0.00089112, 0.0010310...
I want to plot the timeSeries vecvalue vs vectime for each module.
The result is the following:

To do so I can do as follows:
1) Matplotlib
start = datetime.datetime.now()
for row in dfFinal.itertuples():
    t = row.vectime
    x = row.vecvalue
    x = runningAvg(x)
    plot(t,x)
total = (datetime.datetime.now() - start).total_seconds()
print("Total time: ",total)
Doing so, takes 0.07005 seconds to accomplish.
2) Seaborn
start = datetime.datetime.now()
for row in dfFinal.itertuples():
    t = row.vectime
    x = row.vecvalue
    x = runningAvg(x)
    DF = pd.DataFrame({'x':x, 't':t})
    sns.lineplot(x='t', y='x', data=DF)
total = (datetime.datetime.now() - start).total_seconds()
print("Total time: ",total)
Doing so, takes 19.157463 seconds to accomplish.
Why is there such a huge difference? What is it that I'm doing so wrong that it takes that long to process a rather small DF?
Seaborn lineplot high cpu; very slow compared to matplotlib.
Behind the scenes, seaborn uses matplotlib to draw its plots. For interactive work, it's recommended to use a Jupyter/IPython interface in matplotlib mode, or else you'll have to call matplotlib. pyplot. show() when you want to see the plot.
Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas DataFrame s.
lmplot() method is used to draw a scatter plot onto a FacetGrid.
Set ci=None in the call to lineplot; otherwise, confidence intervals will be computed resulting in some expensive (and unnecessary) df.groupby calls.
An aside: the snakeviz module is a great tool for quickly finding computational bottlenecks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With