Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting two dataframes obtained from a loop in the same graph Python

I would like to plot two dfs with two different colors. For each df, I would need to add two markers. Here is what I have tried:

for stats_file in stats_files:
    data = Graph(stats_file)
    Graph.compute(data)
    data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line')
    plt.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
    plt.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

Using this piece of code, I get the servers_df plotted with markers, but on separate graphs. How I can have both graphs in a single one to compare them better?

Thanks.

like image 820
Albert Avatar asked Nov 16 '20 10:11

Albert


4 Answers

TL;DR

Your call to data.servers_df.plot() always creates a new plot, and plt.plot() plots on the latest plot that was created. The solution is to create dedicated axis for everything to plot onto.

Preface

I assumed your variables are the following

  • data.servers_df: Dataframe with two float columns "time" and "percentage"
  • data.first_measurements: A dictionary with keys "time" and `"percentage", which each are a list of floats
  • data.second_measurements: A dictionary with keys "time" and "percentage", which each are a list of floats

I skipped generating stat_files as you did not show what Graph() does, but just created a list of dummy data.

If data.first_measurements and data.second_measurements are also dataframes, let me know and there is an even nicer solution.

Theory - Behind the curtains

Each matplotlib plot (line, bar, etc.) lives on a matplotlib.axes.Axes element. These are like regular axes of a coordinate system. Now two things happen here:

  • When you use plt.plot(), there are no axes specified and thus, matplotlib looks up the current axes element (in the background), and if there is none, it will create an empty one and use it, and set is as default. The second call to plt.plot() then finds these axes and uses them.
  • DataFrame.plot() on the other hand, always creates a new axes element if none is given to it (possible through the ax argument)

So in your code, data.servers_df.plot() first creates an axes element behind the curtains (which is then the default), and the two following plt.plot() calls get the default axes and plot onto it - which is why you get two plots instead of one.

Solution

The following solution first creates a dedicated matplotlib.axes.Axes using plt.subplots(). This axis element is then used to draw all lines onto. Note especially the ax=ax in data.server_df.plot(). Note that I changed the display of your markers from o- to o (as we don't want to display a line (-) but only markers (o)). Mock data can be found below

fig, ax = plt.subplots()  # Here we create the axes that all data will plot onto
for i, data in enumerate(stat_files):
    y_column = f'percentage_{i}'  # Make the columns identifiable
    data.servers_df \
        .rename(columns={'percentage': y_column}) \
        .plot(x='time', y=y_column, linewidth=1, kind='line', ax=ax)
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o', color='orange')
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o', color='green')
plt.show()

enter image description here

Mock data

import random

import pandas as pd
import matplotlib.pyplot as plt

# Generation of dummy data
random.seed(1)
NUMBER_OF_DATA_FILES = 2
X_LENGTH = 10


class Data:
    def __init__(self):
        self.servers_df = pd.DataFrame(
            {
                'time': range(X_LENGTH),
                'percentage': [random.randint(0, 10) for _ in range(X_LENGTH)]
            }
        )
        self.first_measurement = {
            'time': self.servers_df['time'].values[:X_LENGTH // 2],
            'percentage': self.servers_df['percentage'].values[:X_LENGTH // 2]
        }
        self.second_measurement = {
            'time': self.servers_df['time'].values[X_LENGTH // 2:],
            'percentage': self.servers_df['percentage'].values[X_LENGTH // 2:]
        }


stat_files = [Data() for _ in range(NUMBER_OF_DATA_FILES)]
like image 192
BStadlbauer Avatar answered Oct 23 '22 11:10

BStadlbauer


DataFrame.plot() by default returns a matplotlib.axes.Axes object. You should then plot the other two plots on this object:

for stats_file in stats_files:
    data = Graph(stats_file)
    Graph.compute(data)
    ax = data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line')
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

If you want to plot them one on top of the others with different colors you can do something like this:

colors = ['C0', 'C1', 'C2']  # matplotlib default color palette
                             # assuming that len(stats_files) = 3
                             # if not you need to specify as many colors as necessary 

ax = plt.subplot(111)
for stats_file, c in zip(stats_files, colors):
    data = Graph(stats_file)
    Graph.compute(data)
    data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line', ax=ax)
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color=c)
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

This just changes the color of the servers_df.plot. If you want to change the color of the other two you can just to the same logic: create a list of colors that you want them to take at each iteration, iterate over that list and pass the color value to the color param at each iteration.

like image 36
Djib2011 Avatar answered Oct 23 '22 11:10

Djib2011


You can create an Axes object for plotting in the first place, for example

import pandas as pd
import numpy as np 
from matplotlib import pyplot as plt 


df_one = pd.DataFrame({'a':np.linspace(1,10,10),'b':np.linspace(1,10,10)})
df_two = pd.DataFrame({'a':np.random.randint(0,20,10),'b':np.random.randint(0,5,10)})

dfs = [df_one,df_two]
fig,ax = plt.subplots(figsize=(8,6))

colors = ['navy','darkviolet']
markers = ['x','o']
for ind,item in enumerate(dfs):
    ax.plot(item['a'],item['b'],c=colors[ind],marker=markers[ind])

as you can see, in the same ax, the two dataframes are plotted with different colors and markers.

output

like image 33
meTchaikovsky Avatar answered Oct 23 '22 10:10

meTchaikovsky


You need to create the plot before. Afterwards, you can explicitly refer to this plot while plotting the graphs. df.plot(..., ax=ax) or ax.plot(x, y)

import matplotlib.pyplot as plt

(fig, ax) = plt.subplots(figsize=(20,5))

for stats_file in stats_files:
    data = Graph(stats_file)
    Graph.compute(data)
    data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line', ax=ax)
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()
like image 44
François B. Avatar answered Oct 23 '22 12:10

François B.