making matplotlib scatter plots from dataframes in Python's pandas

Vary scatter point size based on another column

plt.scatter(df.col1, df.col2, s=df.col3)
# OR (with pandas 0.13 and up)
df.plot(kind='scatter', x='col1', y='col2', s=df.col3)

enter image description here

Vary scatter point color based on another column

colors = np.where(df.col3 > 300, 'r', 'k')
plt.scatter(df.col1, df.col2, s=120, c=colors)
# OR (with pandas 0.13 and up)
df.plot(kind='scatter', x='col1', y='col2', s=120, c=colors)

enter image description here

Scatter plot with legend

However, the easiest way I've found to create a scatter plot with legend is to call plt.scatter once for each point type.

cond = df.col3 > 300
subset_a = df[cond].dropna()
subset_b = df[~cond].dropna()
plt.scatter(subset_a.col1, subset_a.col2, s=120, c='b', label='col3 > 300')
plt.scatter(subset_b.col1, subset_b.col2, s=60, c='r', label='col3 <= 300') 
plt.legend()

enter image description here

Update

From what I can tell, matplotlib simply skips points with NA x/y coordinates or NA style settings (e.g., color/size). To find points skipped due to NA, try the isnull method: df[df.col3.isnull()]

To split a list of points into many types, take a look at numpy select, which is a vectorized if-then-else implementation and accepts an optional default value. For example:

df['subset'] = np.select([df.col3 < 150, df.col3 < 400, df.col3 < 600],
                         [0, 1, 2], -1)
for color, label in zip('bgrm', [0, 1, 2, -1]):
    subset = df[df.subset == label]
    plt.scatter(subset.col1, subset.col2, s=120, c=color, label=str(label))
plt.legend()

enter image description here

There is little to be added to Garrett's great answer, but pandas also has a scatter method. Using that, it's as easy as

df = pd.DataFrame(np.random.randn(10,2), columns=['col1','col2'])
df['col3'] = np.arange(len(df))**2 * 100 + 100
df.plot.scatter('col1', 'col2', df['col3'])

plotting sizes in col3 to col1-col2

I will recommend to use an alternative method using seaborn which more powerful tool for data plotting. You can use seaborn scatterplot and define colum 3 as hue and size.

Working code:

import pandas as pd
import seaborn as sns
import numpy as np

#creating sample data 
sample_data={'col_name_1':np.random.rand(20),
      'col_name_2': np.random.rand(20),'col_name_3': np.arange(20)*100}
df= pd.DataFrame(sample_data)
sns.scatterplot(x="col_name_1", y="col_name_2", data=df, hue="col_name_3",size="col_name_3")

enter image description here

Related questions
                            
                                Format certain floating dataframe columns into percentage in pandas
                            
                                Mayavi colorbar in TraitsUI creating blank window
                            
                                How to *actually* read CSV data in TensorFlow?
                            
                                Python Setup Disabling Path Length Limit Pros and Cons?
                            
                                Python PDF library [closed]
                            
                                Should I use np.absolute or np.abs?
                            
                                Example of what SQLAlchemy can do, and Django ORM cannot
                            
                                nose vs pytest - what are the (subjective) differences that should make me pick either? [closed]
                            
                                What is the equivalent of php's print_r() in python?
                            
                                Is there a module for balanced binary tree in Python's standard library?
                            
                                ValueError: Length of values does not match length of index | Pandas DataFrame.unique()
                            
                                Python defaultdict and lambda
                            
                                What's the difference between ThreadPool vs Pool in the multiprocessing module?
                            
                                Matplotlib: how to set the current figure?
                            
                                Is it possible to use Python to write cross-platform apps for both iOS and Android?
                            
                                Flattening a list of NumPy arrays?
                            
                                Does the Python 3 interpreter have a JIT feature?
                            
                                Python method/function arguments starting with asterisk and dual asterisk [duplicate]
                            
                                Creating a new corpus with NLTK
                            
                                Will scikit-learn utilize GPU?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

making matplotlib scatter plots from dataframes in Python's pandas

Tags:

python

pandas

dataframe

matplotlib

plot

People also ask

Vary scatter point size based on another column

Vary scatter point color based on another column

Scatter plot with legend

Update

Recent Activity

Donate For Us