Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python saving multiple subplot figures to pdf

I am new with python I am trying to save a huge bunch of data into a pdf with figures using PdfPages of matplotlib and subplots. Problem is that I found a blottleneck I dont know how to solve, the code goes something like:

import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

with PdfPages('myfigures.pdf') as pdf:
for i in range(1000):
    f,axarr = plt.subplots(2, 3)
    plt.subplots(2, 3)
    axarr[0, 0].plot(x1, y1)
    axarr[1, 0].plot(x2, y2)

    pdf.savefig(f)
    plt.close('all')

Creating a figure each loop it is highly time consuming, but if I put that outside the loop it doesnt clear each plot. Other options I tried like clear() or clf() didnt work either or ended in creating multiple different figures, anyone as an idea on how to put this in a different way so that it goes faster?

like image 489
Victor Avatar asked Aug 14 '16 01:08

Victor


People also ask

How do I save multiple figures as one PDF in Python?

Create another figure (fig2) or activate and existing figure using figure() method. Plot the second line using plot() method. Initialize a variable, filename, to make a pdf file. Create a user-defined function save_multi_image() to save multiple images in a PDF file.

How do I save multiple plots as an image in Python?

Saving a plot on your disk as an image file Now if you want to save matplotlib figures as image files programmatically, then all you need is matplotlib. pyplot. savefig() function. Simply pass the desired filename (and even location) and the figure will be stored on your disk.

How do you save multiple plots in Jupyter notebook?

You can output each plot as an image, maybe into a new, separate directory, in the course of running your notebook and then at the end of the notebook code a section in using ReportLab or Pillow to iterate on the images in your directory to composite them together as you wish.

How do I save a python figure as a PDF?

To save the file in PDF format, use savefig() method where the image name is myImagePDF. pdf, format = ”pdf”. To show the image, use the plt. show() method.


1 Answers

Multipage PDF appending w/ matplotlib

Create 𝑚-rows × 𝑛-cols matrices of subplot axes arrays per pdf page & save (append) as each page's matrix of subplots becomes completely full → then create new page, repeat, 𝐞𝐭𝐜.

To contain large numbers of subplots as multipage output inside a single pdf, immediately start filling the first page with your plot(s), then you'll need to create a new page after detecting that the latest subplot addition in your iteration of plot generation has maxed out the available space in the current page's 𝑚-rows × 𝑛-cols subplot-array layout [i.e., an 𝑚 × 𝑛 matrix of subplots], as applicable.

Here's a way to do it where the dimensions (𝑚 × 𝑛) controlling the number of subplots per page can easily be changed:

    import sys

    import matplotlib
    from matplotlib.backends.backend_pdf import PdfPages
    import matplotlib.pyplot as plt
    import numpy as np


    matplotlib.rcParams.update({"font.size": 6})

    # Dimensions for any m-rows × n-cols array of subplots / pg.
    m, n = 4, 5

    # Don't forget to indent after the with statement
    with PdfPages("auto_subplotting.pdf") as pdf:

        """Before beginning the iteration through all the data,
        initialize the layout for the plots and create a
        representation of the subplots that can be easily
        iterated over for knowing when to create the next page
        (and also for custom settings like partial axes labels)"""
        f, axarr = plt.subplots(m, n, sharex="col", sharey="row")
        arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
        subplots = [axarr[index] for index in arr_ij]

        # To conserve needed plotting real estate,
        # only label the bottom row and leftmost subplots
        # as determined automatically using m and n
        splot_index = 0
        for s, splot in enumerate(subplots):
            splot.set_ylim(0, 0.15)
            splot.set_xlim(0, 50)
            last_row = m * n - s < n + 1
            first_in_row = s % n == 0
            if last_row:
                splot.set_xlabel("X-axis label")
            if first_in_row:
                splot.set_ylabel("Y-axis label")

        # Iterate through each sample in the data
        for sample in range(33):

            # As a stand-in for real data, let's just make numpy take 100 random draws
            # from a poisson distribution centered around say ~25 and then display
            # the outcome as a histogram
            scaled_y = np.random.randint(20, 30)
            random_data = np.random.poisson(scaled_y, 100)
            subplots[splot_index].hist(
                random_data,
                bins=12,
                normed=True,
                fc=(0, 0, 0, 0),
                lw=0.75,
                ec="b",
            )

            # Keep collecting subplots (into the mpl-created array; 
            # see: [1]) through the samples in the data and increment
            # a counter each time. The page will be full once the count is equal
            # to the product of the user-set dimensions (i.e. m * n)
            splot_index += 1

            """Once an mxn number of subplots have been collected 
            you now have a full page's worth, and it's time to 
            close and save to pdf that page and re-initialize for a
            new page possibly. We can basically repeat the same 
            exact code block used for the first layout 
            initialization, but with the addition of 3 new lines:
             +2 for creating & saving the just-finished pdf page,
             +1 more to reset the subplot index (back to zero)"""
            if splot_index == m * n:
                pdf.savefig()
                plt.close(f)
                f, axarr = plt.subplots(m, n, sharex="col", sharey="row")
                arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
                subplots = [axarr[index] for index in arr_ij]
                splot_index = 0
                for s, splot in enumerate(subplots):
                    splot.set_ylim(0, 0.15)
                    splot.set_xlim(0, 50)
                    last_row = (m * n) - s < n + 1
                    first_in_row = s % n == 0
                    if last_row:
                        splot.set_xlabel("X-axis label")
                    if first_in_row:
                        splot.set_ylabel("Y-axis label")

        # Done!
        # But don't forget to save to pdf after the last page    
        pdf.savefig()
        plt.close(f)

For any m×n layout, just change the declarations for the values of m and n, respectively. From the code above (where "m, n = 4, 5"), a 4x5 matrix of subplots with a total 33 samples is produced as a two-page pdf output file:

References

  1. Link to matplotlib subplots official docs.

Note: There will be, on the final page of the multipage PDF, a number of blank subplots equal to the remainder from the the product of your chosen subplots 𝑚 × 𝑛 layout dimension numbers and your total number of samples/data to plot. E.g., say m=3, and n=4, thus you get 3 rows of 4 subplots each equals 12 per page, and if you had say 20 samples, then there would be a two-page pdf auto-created with a total of 24 subplots with the last 4 (so full bottom-most row in this hypothetical example) of subplots on the second page empty.


Using seaborn

For a more advanced (& more "pythonic"*) extension of the implementation above, see below:

The multipage handling should probably be simplified by creating a new_page function; it's better to not repeat code verbatim*, especially if you start customizing the plots in which case you won't want to have to mirror every change and type the same thing twice. A more customized aesthetic based off of seaborn and utilizing the available matplotlib parameters like shown below might be preferable too.

Add a new_page function & some customizations for the subplot style:

    import matplotlib.pyplot as plt
    import numpy as np
    import random
    import seaborn as sns

    from matplotlib.backends.backend_pdf import PdfPages

    # this erases labels for any blank plots on the last page
    sns.set(font_scale=0.0)
    m, n = 4, 6
    datasize = 37 
    # 37 % (m*n) = 13, (m*n) - 13 = 24 - 13 = 11. Thus 11 blank subplots on final page
    
    # custom colors scheme / palette
    ctheme = [
        "k", "gray", "magenta", "fuchsia", "#be03fd", "#1e488f",
        (0.44313725490196076, 0.44313725490196076, 0.88627450980392153), "#75bbfd",
        "teal", "lime", "g", (0.6666674, 0.6666663, 0.29078014184397138), "y",
        "#f1da7a", "tan", "orange", "maroon", "r", ] # pick whatever colors you wish
    colors = sns.blend_palette(ctheme, datasize)
    fz = 7  # labels fontsize


    def new_page(m, n):
        global splot_index
        splot_index = 0
        fig, axarr = plt.subplots(m, n, sharey="row")
        plt.subplots_adjust(hspace=0.5, wspace=0.15)
        arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
        subplots = [axarr[index] for index in arr_ij]
        for s, splot in enumerate(subplots):
            splot.grid(
                b=True,
                which="major",
                color="gray",
                linestyle="-",
                alpha=0.25,
                zorder=1,
                lw=0.5,
            )
            splot.set_ylim(0, 0.15)
            splot.set_xlim(0, 50)
            last_row = m * n - s < n + 1
            first_in_row = s % n == 0
            if last_row:
                splot.set_xlabel("X-axis label", labelpad=8, fontsize=fz)
            if first_in_row:
                splot.set_ylabel("Y-axis label", labelpad=8, fontsize=fz)
        return (fig, subplots)


    with PdfPages("auto_subplotting_colors.pdf") as pdf:

        fig, subplots = new_page(m, n)

        for sample in xrange(datasize):
            splot = subplots[splot_index]
            splot_index += 1
            scaled_y = np.random.randint(20, 30)
            random_data = np.random.poisson(scaled_y, 100)
            splot.hist(
                random_data,
                bins=12,
                normed=True,
                zorder=2,
                alpha=0.99,
                fc="white",
                lw=0.75,
                ec=colors.pop(),
            )
            splot.set_title("Sample {}".format(sample + 1), fontsize=fz)
            # tick fontsize & spacing
            splot.xaxis.set_tick_params(pad=4, labelsize=6)
            splot.yaxis.set_tick_params(pad=4, labelsize=6)

            # make new page:
            if splot_index == m * n:
                pdf.savefig()
                plt.close(fig)
                fig, subplots = new_page(m, n)

        if splot_index > 0:
            pdf.savefig()
            plt.close(f)


like image 94
John Collins Avatar answered Oct 20 '22 21:10

John Collins