I recently saw this package for R/ggplot2, which lets one to have multiple annotations on a plot and automatically adjust their position to minimize overlap, and this way improve the readability. Is there anything similar available for python/matplotlib?
EDIT: I've found Matplotlib overlapping annotations / text and it looks promising, but seems like the result is inferior to the R package.
Example:
from matplotlib import pyplot as plt
import numpy as np
xs = np.arange(10, step=0.1)+np.random.random(100)*3
ys = np.arange(10, step=0.1)+np.random.random(100)*3
labels = np.arange(100)
plt.scatter(xs, ys)
for x, y, s in zip(xs, ys, labels):
plt.text(x, y, s)
plt.show()
You can see that even such short labels create a crazy mess when the data density is high.
Use legend() method to avoid overlapping of labels and autopct. To display the figure, use show() method.
Dot Size. You can try to decrease marker size in your plot. This way they won't overlap and the patterns will be clearer.
[12-11-2016 updated the code and second figure again since the library has been significantly improved since then]
ANSWER COMPLETELY REWRITTEN
I've made a small library for this purpose, which works similarly to above mentioned ggrepel: https://github.com/Phlya/adjustText
With switched off repelling from points it produces something decent even for this difficult example:
from matplotlib import pyplot as plt
from adjustText import adjust_text
import numpy as np
np.random.seed(2016)
xs = np.arange(10, step=0.1) + np.random.random(100) * 3
ys = np.arange(10, step=0.1) + np.random.random(100) * 3
labels = np.arange(100)
f = plt.figure()
scatter = plt.scatter(xs, ys, s=15, c='r', edgecolors='w')
texts = []
for x, y, s in zip(xs, ys, labels):
texts.append(plt.text(x, y, s))
plt.show()
adjust_text(texts, force_points=0.2, force_text=0.2,
expand_points=(1, 1), expand_text=(1, 1),
arrowprops=dict(arrowstyle="-", color='black', lw=0.5))
plt.show()
Building on tcaswell's answer, you could repel labels using networkx's spring_layout
which implements the Fruchterman Reingold force-directed layout algorithm:
import matplotlib.pyplot as plt
import numpy as np
import networkx as nx
np.random.seed(2016)
xs = np.arange(10, step=0.1)+np.random.random(100)*3
ys = np.arange(10, step=0.1)+np.random.random(100)*3
labels = np.arange(100)
def repel_labels(ax, x, y, labels, k=0.01):
G = nx.DiGraph()
data_nodes = []
init_pos = {}
for xi, yi, label in zip(x, y, labels):
data_str = 'data_{0}'.format(label)
G.add_node(data_str)
G.add_node(label)
G.add_edge(label, data_str)
data_nodes.append(data_str)
init_pos[data_str] = (xi, yi)
init_pos[label] = (xi, yi)
pos = nx.spring_layout(G, pos=init_pos, fixed=data_nodes, k=k)
# undo spring_layout's rescaling
pos_after = np.vstack([pos[d] for d in data_nodes])
pos_before = np.vstack([init_pos[d] for d in data_nodes])
scale, shift_x = np.polyfit(pos_after[:,0], pos_before[:,0], 1)
scale, shift_y = np.polyfit(pos_after[:,1], pos_before[:,1], 1)
shift = np.array([shift_x, shift_y])
for key, val in pos.iteritems():
pos[key] = (val*scale) + shift
for label, data_str in G.edges():
ax.annotate(label,
xy=pos[data_str], xycoords='data',
xytext=pos[label], textcoords='data',
arrowprops=dict(arrowstyle="->",
shrinkA=0, shrinkB=0,
connectionstyle="arc3",
color='red'), )
# expand limits
all_pos = np.vstack(pos.values())
x_span, y_span = np.ptp(all_pos, axis=0)
mins = np.min(all_pos-x_span*0.15, 0)
maxs = np.max(all_pos+y_span*0.15, 0)
ax.set_xlim([mins[0], maxs[0]])
ax.set_ylim([mins[1], maxs[1]])
fig, ax = plt.subplots()
ax.plot(xs, ys, 'o')
repel_labels(ax, xs, ys, labels, k=0.0025)
plt.show()
yields
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With