I am new with python I am trying to save a huge bunch of data into a pdf with figures using PdfPages of matplotlib and subplots. Problem is that I found a blottleneck I dont know how to solve, the code goes something like:
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
with PdfPages('myfigures.pdf') as pdf:
for i in range(1000):
f,axarr = plt.subplots(2, 3)
plt.subplots(2, 3)
axarr[0, 0].plot(x1, y1)
axarr[1, 0].plot(x2, y2)
pdf.savefig(f)
plt.close('all')
Creating a figure each loop it is highly time consuming, but if I put that outside the loop it doesnt clear each plot. Other options I tried like clear() or clf() didnt work either or ended in creating multiple different figures, anyone as an idea on how to put this in a different way so that it goes faster?
Create another figure (fig2) or activate and existing figure using figure() method. Plot the second line using plot() method. Initialize a variable, filename, to make a pdf file. Create a user-defined function save_multi_image() to save multiple images in a PDF file.
Saving a plot on your disk as an image file Now if you want to save matplotlib figures as image files programmatically, then all you need is matplotlib. pyplot. savefig() function. Simply pass the desired filename (and even location) and the figure will be stored on your disk.
You can output each plot as an image, maybe into a new, separate directory, in the course of running your notebook and then at the end of the notebook code a section in using ReportLab or Pillow to iterate on the images in your directory to composite them together as you wish.
To save the file in PDF format, use savefig() method where the image name is myImagePDF. pdf, format = ”pdf”. To show the image, use the plt. show() method.
matplotlib
axes
arrays per pdf
page & save (append) as each page's matrix of subplots becomes completely full → then create new page, repeat, 𝐞𝐭𝐜.To contain large numbers of subplots as multipage output inside a single pdf, immediately start filling the first page with your plot(s), then you'll need to create a new page after detecting that the latest subplot addition in your iteration of plot generation has maxed out the available space in the current page's 𝑚-rows × 𝑛-cols subplot-array layout [i.e., an 𝑚 × 𝑛 matrix of subplots], as applicable.
import sys
import matplotlib
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
import numpy as np
matplotlib.rcParams.update({"font.size": 6})
# Dimensions for any m-rows × n-cols array of subplots / pg.
m, n = 4, 5
# Don't forget to indent after the with statement
with PdfPages("auto_subplotting.pdf") as pdf:
"""Before beginning the iteration through all the data,
initialize the layout for the plots and create a
representation of the subplots that can be easily
iterated over for knowing when to create the next page
(and also for custom settings like partial axes labels)"""
f, axarr = plt.subplots(m, n, sharex="col", sharey="row")
arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
subplots = [axarr[index] for index in arr_ij]
# To conserve needed plotting real estate,
# only label the bottom row and leftmost subplots
# as determined automatically using m and n
splot_index = 0
for s, splot in enumerate(subplots):
splot.set_ylim(0, 0.15)
splot.set_xlim(0, 50)
last_row = m * n - s < n + 1
first_in_row = s % n == 0
if last_row:
splot.set_xlabel("X-axis label")
if first_in_row:
splot.set_ylabel("Y-axis label")
# Iterate through each sample in the data
for sample in range(33):
# As a stand-in for real data, let's just make numpy take 100 random draws
# from a poisson distribution centered around say ~25 and then display
# the outcome as a histogram
scaled_y = np.random.randint(20, 30)
random_data = np.random.poisson(scaled_y, 100)
subplots[splot_index].hist(
random_data,
bins=12,
normed=True,
fc=(0, 0, 0, 0),
lw=0.75,
ec="b",
)
# Keep collecting subplots (into the mpl-created array;
# see: [1]) through the samples in the data and increment
# a counter each time. The page will be full once the count is equal
# to the product of the user-set dimensions (i.e. m * n)
splot_index += 1
"""Once an mxn number of subplots have been collected
you now have a full page's worth, and it's time to
close and save to pdf that page and re-initialize for a
new page possibly. We can basically repeat the same
exact code block used for the first layout
initialization, but with the addition of 3 new lines:
+2 for creating & saving the just-finished pdf page,
+1 more to reset the subplot index (back to zero)"""
if splot_index == m * n:
pdf.savefig()
plt.close(f)
f, axarr = plt.subplots(m, n, sharex="col", sharey="row")
arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
subplots = [axarr[index] for index in arr_ij]
splot_index = 0
for s, splot in enumerate(subplots):
splot.set_ylim(0, 0.15)
splot.set_xlim(0, 50)
last_row = (m * n) - s < n + 1
first_in_row = s % n == 0
if last_row:
splot.set_xlabel("X-axis label")
if first_in_row:
splot.set_ylabel("Y-axis label")
# Done!
# But don't forget to save to pdf after the last page
pdf.savefig()
plt.close(f)
For any m×n layout, just change the declarations for the values of m and n, respectively. From the code above (where "m, n = 4, 5
"), a 4x5 matrix of subplots with a total 33 samples is produced as a two-page pdf output file:
Note: There will be, on the final page of the multipage PDF, a number of blank subplots equal to the remainder from the the product of your chosen subplots 𝑚 × 𝑛 layout dimension numbers and your total number of samples/data to plot. E.g., say m=3, and n=4, thus you get 3 rows of 4 subplots each equals 12 per page, and if you had say 20 samples, then there would be a two-page pdf auto-created with a total of 24 subplots with the last 4 (so full bottom-most row in this hypothetical example) of subplots on the second page empty.
seaborn
The multipage handling should probably be simplified by creating a new_page
function; it's better to not repeat code verbatim*, especially if you start customizing the plots in which case you won't want to have to mirror every change and type the same thing twice. A more customized aesthetic based off of seaborn
and utilizing the available matplotlib
parameters like shown below might be preferable too.
Add a new_page
function & some customizations for the subplot style:
import matplotlib.pyplot as plt
import numpy as np
import random
import seaborn as sns
from matplotlib.backends.backend_pdf import PdfPages
# this erases labels for any blank plots on the last page
sns.set(font_scale=0.0)
m, n = 4, 6
datasize = 37
# 37 % (m*n) = 13, (m*n) - 13 = 24 - 13 = 11. Thus 11 blank subplots on final page
# custom colors scheme / palette
ctheme = [
"k", "gray", "magenta", "fuchsia", "#be03fd", "#1e488f",
(0.44313725490196076, 0.44313725490196076, 0.88627450980392153), "#75bbfd",
"teal", "lime", "g", (0.6666674, 0.6666663, 0.29078014184397138), "y",
"#f1da7a", "tan", "orange", "maroon", "r", ] # pick whatever colors you wish
colors = sns.blend_palette(ctheme, datasize)
fz = 7 # labels fontsize
def new_page(m, n):
global splot_index
splot_index = 0
fig, axarr = plt.subplots(m, n, sharey="row")
plt.subplots_adjust(hspace=0.5, wspace=0.15)
arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
subplots = [axarr[index] for index in arr_ij]
for s, splot in enumerate(subplots):
splot.grid(
b=True,
which="major",
color="gray",
linestyle="-",
alpha=0.25,
zorder=1,
lw=0.5,
)
splot.set_ylim(0, 0.15)
splot.set_xlim(0, 50)
last_row = m * n - s < n + 1
first_in_row = s % n == 0
if last_row:
splot.set_xlabel("X-axis label", labelpad=8, fontsize=fz)
if first_in_row:
splot.set_ylabel("Y-axis label", labelpad=8, fontsize=fz)
return (fig, subplots)
with PdfPages("auto_subplotting_colors.pdf") as pdf:
fig, subplots = new_page(m, n)
for sample in xrange(datasize):
splot = subplots[splot_index]
splot_index += 1
scaled_y = np.random.randint(20, 30)
random_data = np.random.poisson(scaled_y, 100)
splot.hist(
random_data,
bins=12,
normed=True,
zorder=2,
alpha=0.99,
fc="white",
lw=0.75,
ec=colors.pop(),
)
splot.set_title("Sample {}".format(sample + 1), fontsize=fz)
# tick fontsize & spacing
splot.xaxis.set_tick_params(pad=4, labelsize=6)
splot.yaxis.set_tick_params(pad=4, labelsize=6)
# make new page:
if splot_index == m * n:
pdf.savefig()
plt.close(fig)
fig, subplots = new_page(m, n)
if splot_index > 0:
pdf.savefig()
plt.close(f)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With