Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Annotate data points while plotting from Pandas DataFrame

Here's a (very) slightly slicker version of Dan Allan's answer:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string

df = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)}, 
                  index=list(string.ascii_lowercase[:10]))

Which gives:

          x         y
a  0.541974  0.042185
b  0.036188  0.775425
c  0.950099  0.888305
d  0.739367  0.638368
e  0.739910  0.596037
f  0.974529  0.111819
g  0.640637  0.161805
h  0.554600  0.172221
i  0.718941  0.192932
j  0.447242  0.172469

And then:

fig, ax = plt.subplots()
df.plot('x', 'y', kind='scatter', ax=ax)

for k, v in df.iterrows():
    ax.annotate(k, v)

Finally, if you're in interactive mode you might need to refresh the plot:

fig.canvas.draw()

Which produces: Boring scatter plot

Or, since that looks incredibly ugly, you can beautify things a bit pretty easily:

from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot('x', 'y', kind='scatter', ax=ax, s=120, linewidth=0, 
        c=range(len(df)), colormap=cmap)

for k, v in df.iterrows():
    ax.annotate(k, v,
                xytext=(10,-5), textcoords='offset points',
                family='sans-serif', fontsize=18, color='darkslategrey')

Which looks a lot nicer: Nice scatter plot


Do you want to use one of the other columns as the text of the annotation? This is something I did recently.

Starting with some example data

In [1]: df
Out[1]: 
           x         y val
 0 -1.015235  0.840049   a
 1 -0.427016  0.880745   b
 2  0.744470 -0.401485   c
 3  1.334952 -0.708141   d
 4  0.127634 -1.335107   e

Plot the points. I plot y against x, in this example.

ax = df.set_index('x')['y'].plot(style='o')

Write a function that loops over x, y, and the value to annotate beside the point.

def label_point(x, y, val, ax):
    a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
    for i, point in a.iterrows():
        ax.text(point['x'], point['y'], str(point['val']))

label_point(df.x, df.y, df.val, ax)

draw()

Annotated Points


Let's assume your df has multiple columns, and three of which are x, y, and lbl. To annotate your (x,y) scatter plot with lbl, simply:

ax = df.plot(kind='scatter',x='x',y='y')
df[['x','y','lbl']].apply(lambda row: ax.text(*row),axis=1);

I found the previous answers quite helpful, especially LondonRob's example that improved the layout a bit.

The only thing that bothered me is that I don't like pulling data out of DataFrames to then loop over them. Seems a waste of the DataFrame.

Here was an alternative that avoids the loop using .apply(), and includes the nicer-looking annotations (I thought the color scale was a bit overkill and couldn't get the colorbar to go away):

ax = df.plot('x', 'y', kind='scatter', s=50 )

def annotate_df(row):  
    ax.annotate(row.name, row.values,
                xytext=(10,-5), 
                textcoords='offset points',
                size=18, 
                color='darkslategrey')
    
_ = df.apply(annotate_df, axis=1)

enter image description here

Edit Notes

I edited my code example recently. Originally it used the same:

fig, ax = plt.subplots()

as the other posts to expose the axes, however this is unnecessary and makes the:

import matplotlib.pyplot as plt

line also unnecessary.

Also note:

  • If you are trying to reproduce this example and your plots don't have the points in the same place as any of ours, it may be because the DataFrame was using random values. It probably would have been less confusing if we'd used a fixed data table or a random seed.
  • Depending on the points, you may have to play with the xytext values to get better placements.