How can I write to a png/tiff file patch-by-patch?

Tags:

I want to create a png or tiff image file from a very large h5py dataset that cannot be loaded into memory all at once. So, I was wondering if there is a way in python to write to a png or tiff file in patches? (I can load the h5py dataset in slices to a numpy.ndarray). I've tried using the pillow library and doing PIL.Image.paste giving the box coordinates, but for large images it goes out of memory.

Basically, I'm wondering if there's a way to do something like:

for y in range(0, height, patch_size):
    for x in range(0, width, patch_size):
        y2 = min(y + patch_size, height)
        x2 = min(x + patch_size, width)
        # image_arr is an h5py dataset that cannot be loaded completely
        # in memory, so load it in slices
        image_file.write(image_arr[y:y2, x:x2], box=(y, x, y2, x2))

I'm looking for a way to do this, without having the whole image loaded into memory. I've tried the pillow library, but it loads/keeps all the data in memory.

Edit: This question is not about h5py, but rather how extremely large images (that cannot be loaded into memory) can we written out to a file in patches - similar to how large text files can be constructed by writing to it line by line.

948

asked Jun 16 '18 23:06

assassin

2 Answers

Try tifffile.memmap:

from tifffile import memmap

image_file = memmap('temp.tif', shape=(height, width), dtype=image_arr.dtype,
                    bigtiff=True)

for y in range(0, height, patch_size):
    for x in range(0, width, patch_size):
        y2 = min(y + patch_size, height)
        x2 = min(x + patch_size, width)
        image_file[y:y2, x:x2] = image_arr[y:y2, x:x2]

image_file.flush()

This creates a uncompressed BigTIFF file with one strip. Memory-mapped tiles are not implemented yet. Not sure how many libraries can handle that kind of file, but you can always directly read from the strip using the meta data in the TIFF tags.

answered Sep 27 '22 21:09

cgohlke

Short answer to "if there is a way in Python to write to a png or tiff file in patches?". Well, yes - everything is possible in Python, given enough time and skill to implement it. On the other hand, NO, there is no ready-made solution for this - because it doesn't appear to be very useful.

I don't know about TIFF and a comment here says it is limited to 4GB, so this format is likely not a good candidate. PNG has no practical limit and can be written in chunks, so it is doable in theory - on the condition that at least one scan line of your resulting image does fit into memory.

If you really want to go ahead with this, here is the info that you need: A PNG file consists of a few metadata chunks and a series of image data chunks. The latter are independent of each other and you can therefore construct a big image out of several smaller images (each of which contains a whole number of rows, a minimum of one row) by simply concatenating their image data chunks (IDAT) together and adding the needed metadata chunks (you can pick those from the first small image, except for the IHDR chunk - that one will need to be constructed to contain the final image size).

So, here is how I'd do it, if I had to (NOTE you will need some understanding of Python's bytes type and the methods of converting byte sequences to and from Python data types to pull this off):

find how many rows I can fit into memory and make that the height of my "small image chunk". The width is the width of the entire final image. let's call those width and small_height
go through my giant data set in h5py one chunk at a time (width * small_height), convert it to PNG and save it to disk in a temporary file, or if your image conversion library allows it - directly to a bytes string in memory. Then process the byte data as follows and delete it at the end:

-- on the first iteration: walk through the PNG data one record at a time (see the PNG spec: http://www.libpng.org/pub/png/spec/1.2/png-1.2-pdg.html, it is in length-tag-value form and very easy to write code that efficiently walks over the file record by record), save ALL the records into my target file, except: modify IHDR to have the final image size and skip the IEND record.

-- on all subsequent iterations: scan through the PNG data and pick only the IDAT records, write those out to the output file.
append an IEND record to the target file.

All done - you should now have a valid humongous PNG. I wonder who or what could read that, though.

answered Sep 27 '22 19:09

Leo K

Related questions
                            
                                PySpark: Add a column to DataFrame when column is a list
                            
                                pytest with setup.py test
                            
                                Extracting one-hot vector from text
                            
                                How to live with both enum and enum34?
                            
                                Python multiprocessing - AssertionError: can only join a child process
                            
                                Process request thread error with Flask Application?
                            
                                Windows alternative to pexpect
                            
                                Performance between C-contiguous and Fortran-contiguous array operations
                            
                                Specifying Readonly access for Django.db connection object
                            
                                Out of bounds nanosecond timestamp
                            
                                Authentication only via config file?
                            
                                Tensorflow installation using SSE instructions with pip
                            
                                Show image without scaling
                            
                                How to include a shared C library in a Python package
                            
                                Structuring python projects without path hacks
                            
                                Annotate with django-graphene and filters
                            
                                Python subprocess in .exe
                            
                                Creating a dylib file on MacOS for use with Python wrapper of Steamworks API
                            
                                Spark 2.3 Memory Leak on Executor
                            
                                Unable to exhaust the content of all the identical urls used within my scraper

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I write to a png/tiff file patch-by-patch?

Tags:

python

python-imaging-library

pillow

assassin

People also ask

2 Answers

cgohlke

Leo K

Recent Activity

Donate For Us