Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scatter plots in Pandas/Pyplot: How to plot by category [duplicate]

I am trying to make a simple scatter plot in pyplot using a Pandas DataFrame object, but want an efficient way of plotting two variables but have the symbols dictated by a third column (key). I have tried various ways using df.groupby, but not successfully. A sample df script is below. This colours the markers according to 'key1', but Id like to see a legend with 'key1' categories. Am I close? Thanks.

import numpy as np import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three')) df['key1'] = (4,4,4,6,6,6,8,8,8,8) fig1 = plt.figure(1) ax1 = fig1.add_subplot(111) ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8) plt.show() 
like image 913
user2989613 Avatar asked Feb 09 '14 02:02

user2989613


People also ask

What is the difference between PLT plot and PLT scatter?

The primary difference of plt. scatter from plt. plot is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.


1 Answers

You can use scatter for this, but that requires having numerical values for your key1, and you won't have a legend, as you noticed.

It's better to just use plot for discrete categories like this. For example:

import matplotlib.pyplot as plt import numpy as np import pandas as pd np.random.seed(1974)  # Generate Data num = 20 x, y = np.random.random((2, num)) labels = np.random.choice(['a', 'b', 'c'], num) df = pd.DataFrame(dict(x=x, y=y, label=labels))  groups = df.groupby('label')  # Plot fig, ax = plt.subplots() ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling for name, group in groups:     ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name) ax.legend()  plt.show() 

enter image description here

If you'd like things to look like the default pandas style, then just update the rcParams with the pandas stylesheet and use its color generator. (I'm also tweaking the legend slightly):

import matplotlib.pyplot as plt import numpy as np import pandas as pd np.random.seed(1974)  # Generate Data num = 20 x, y = np.random.random((2, num)) labels = np.random.choice(['a', 'b', 'c'], num) df = pd.DataFrame(dict(x=x, y=y, label=labels))  groups = df.groupby('label')  # Plot plt.rcParams.update(pd.tools.plotting.mpl_stylesheet) colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random')  fig, ax = plt.subplots() ax.set_color_cycle(colors) ax.margins(0.05) for name, group in groups:     ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name) ax.legend(numpoints=1, loc='upper left')  plt.show() 

enter image description here

like image 94
Joe Kington Avatar answered Sep 22 '22 14:09

Joe Kington