I am trying to make a simple scatter plot in pyplot using a Pandas DataFrame object, but want an efficient way of plotting two variables but have the symbols dictated by a third column (key). I have tried various ways using df.groupby, but not successfully. A sample df script is below. This colours the markers according to 'key1', but Id like to see a legend with 'key1' categories. Am I close? Thanks. <pre class="prettyprint"><code>import numpy as np import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three')) df['key1'] = (4,4,4,6,6,6,8,8,8,8) fig1 = plt.figure(1) ax1 = fig1.add_subplot(111) ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8) plt.show() </code></pre>

You can use <code>scatter</code> for this, but that requires having numerical values for your <code>key1</code>, and you won't have a legend, as you noticed. It's better to just use <code>plot</code> for discrete categories like this. For example: <pre class="prettyprint"><code>import matplotlib.pyplot as plt import numpy as np import pandas as pd np.random.seed(1974) # Generate Data num = 20 x, y = np.random.random((2, num)) labels = np.random.choice(['a', 'b', 'c'], num) df = pd.DataFrame(dict(x=x, y=y, label=labels)) groups = df.groupby('label') # Plot fig, ax = plt.subplots() ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling for name, group in groups: ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name) ax.legend() plt.show() </code></pre> <img src="https://i.stack.imgur.com/Svrkn.png" alt="enter image description here"> If you'd like things to look like the default <code>pandas</code> style, then just update the <code>rcParams</code> with the pandas stylesheet and use its color generator. (I'm also tweaking the legend slightly): <pre class="prettyprint"><code>import matplotlib.pyplot as plt import numpy as np import pandas as pd np.random.seed(1974) # Generate Data num = 20 x, y = np.random.random((2, num)) labels = np.random.choice(['a', 'b', 'c'], num) df = pd.DataFrame(dict(x=x, y=y, label=labels)) groups = df.groupby('label') # Plot plt.rcParams.update(pd.tools.plotting.mpl_stylesheet) colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random') fig, ax = plt.subplots() ax.set_color_cycle(colors) ax.margins(0.05) for name, group in groups: ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name) ax.legend(numpoints=1, loc='upper left') plt.show() </code></pre> <img src="https://i.stack.imgur.com/VuZeq.png" alt="enter image description here">

Scatter plots in Pandas/Pyplot: How to plot by category [duplicate]

Tags:

python

pandas

matplotlib

I am trying to make a simple scatter plot in pyplot using a Pandas DataFrame object, but want an efficient way of plotting two variables but have the symbols dictated by a third column (key). I have tried various ways using df.groupby, but not successfully. A sample df script is below. This colours the markers according to 'key1', but Id like to see a legend with 'key1' categories. Am I close? Thanks.

import numpy as np import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three')) df['key1'] = (4,4,4,6,6,6,8,8,8,8) fig1 = plt.figure(1) ax1 = fig1.add_subplot(111) ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8) plt.show()

913

asked Feb 09 '14 02:02

user2989613

1 Answers

You can use scatter for this, but that requires having numerical values for your key1, and you won't have a legend, as you noticed.

It's better to just use plot for discrete categories like this. For example:

import matplotlib.pyplot as plt import numpy as np import pandas as pd np.random.seed(1974)  # Generate Data num = 20 x, y = np.random.random((2, num)) labels = np.random.choice(['a', 'b', 'c'], num) df = pd.DataFrame(dict(x=x, y=y, label=labels))  groups = df.groupby('label')  # Plot fig, ax = plt.subplots() ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling for name, group in groups:     ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name) ax.legend()  plt.show()

enter image description here

If you'd like things to look like the default pandas style, then just update the rcParams with the pandas stylesheet and use its color generator. (I'm also tweaking the legend slightly):

import matplotlib.pyplot as plt import numpy as np import pandas as pd np.random.seed(1974)  # Generate Data num = 20 x, y = np.random.random((2, num)) labels = np.random.choice(['a', 'b', 'c'], num) df = pd.DataFrame(dict(x=x, y=y, label=labels))  groups = df.groupby('label')  # Plot plt.rcParams.update(pd.tools.plotting.mpl_stylesheet) colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random')  fig, ax = plt.subplots() ax.set_color_cycle(colors) ax.margins(0.05) for name, group in groups:     ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name) ax.legend(numpoints=1, loc='upper left')  plt.show()

enter image description here

answered Sep 22 '22 14:09

Joe Kington

Related questions
                            
                                requirements.txt depending on python version
                            
                                What is a 'NoneType' object?
                            
                                Invert image displayed by imshow in matplotlib
                            
                                Emulating Bash 'source' in Python
                            
                                Is it necessary or useful to inherit from Python's object in Python 3.x?
                            
                                Integrating MySQL with Python in Windows
                            
                                Check if string is in a pandas dataframe
                            
                                How to unpack pkl file?
                            
                                Python: converting a list of dictionaries to json
                            
                                Saving plots (AxesSubPlot) generated from python pandas with matplotlib's savefig
                            
                                Display special characters when using print statement
                            
                                Can't install pip packages inside a docker container with Ubuntu
                            
                                Python xticks in subplots
                            
                                How do I read a response from Python Requests?
                            
                                Getting vertical gridlines to appear in line plot in matplotlib
                            
                                extracting days from a numpy.timedelta64 value
                            
                                What do I use on linux to make a python program executable
                            
                                group by pandas dataframe and select latest in each group
                            
                                Python/psycopg2 WHERE IN statement
                            
                                Python send POST with header

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With