Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I write to a png/tiff file patch-by-patch?

I want to create a png or tiff image file from a very large h5py dataset that cannot be loaded into memory all at once. So, I was wondering if there is a way in python to write to a png or tiff file in patches? (I can load the h5py dataset in slices to a numpy.ndarray). I've tried using the pillow library and doing PIL.Image.paste giving the box coordinates, but for large images it goes out of memory.

Basically, I'm wondering if there's a way to do something like:

for y in range(0, height, patch_size):
    for x in range(0, width, patch_size):
        y2 = min(y + patch_size, height)
        x2 = min(x + patch_size, width)
        # image_arr is an h5py dataset that cannot be loaded completely
        # in memory, so load it in slices
        image_file.write(image_arr[y:y2, x:x2], box=(y, x, y2, x2))

I'm looking for a way to do this, without having the whole image loaded into memory. I've tried the pillow library, but it loads/keeps all the data in memory.

Edit: This question is not about h5py, but rather how extremely large images (that cannot be loaded into memory) can we written out to a file in patches - similar to how large text files can be constructed by writing to it line by line.

like image 948
assassin Avatar asked Jun 16 '18 23:06

assassin


People also ask

Can TIFF be converted to PNG?

How to make your TIFF into a PNG output file. Select File and choose Save As. From the options, select PNG. Choose an interlace option.

Is PNG and TIFF the same?

What is the difference between TIFF and PNG files? Both PNGs and TIFFs are excellent choices for displaying complex images. But PNGs tend to be smaller in size, so are potentially better suited for websites. TIFFs, on the other hand, are often the best choice for professional use, scanning, and print options.


2 Answers

Try tifffile.memmap:

from tifffile import memmap

image_file = memmap('temp.tif', shape=(height, width), dtype=image_arr.dtype,
                    bigtiff=True)

for y in range(0, height, patch_size):
    for x in range(0, width, patch_size):
        y2 = min(y + patch_size, height)
        x2 = min(x + patch_size, width)
        image_file[y:y2, x:x2] = image_arr[y:y2, x:x2]

image_file.flush()

This creates a uncompressed BigTIFF file with one strip. Memory-mapped tiles are not implemented yet. Not sure how many libraries can handle that kind of file, but you can always directly read from the strip using the meta data in the TIFF tags.

like image 57
cgohlke Avatar answered Sep 27 '22 21:09

cgohlke


Short answer to "if there is a way in Python to write to a png or tiff file in patches?". Well, yes - everything is possible in Python, given enough time and skill to implement it. On the other hand, NO, there is no ready-made solution for this - because it doesn't appear to be very useful.

I don't know about TIFF and a comment here says it is limited to 4GB, so this format is likely not a good candidate. PNG has no practical limit and can be written in chunks, so it is doable in theory - on the condition that at least one scan line of your resulting image does fit into memory.

If you really want to go ahead with this, here is the info that you need: A PNG file consists of a few metadata chunks and a series of image data chunks. The latter are independent of each other and you can therefore construct a big image out of several smaller images (each of which contains a whole number of rows, a minimum of one row) by simply concatenating their image data chunks (IDAT) together and adding the needed metadata chunks (you can pick those from the first small image, except for the IHDR chunk - that one will need to be constructed to contain the final image size).

So, here is how I'd do it, if I had to (NOTE you will need some understanding of Python's bytes type and the methods of converting byte sequences to and from Python data types to pull this off):

  • find how many rows I can fit into memory and make that the height of my "small image chunk". The width is the width of the entire final image. let's call those width and small_height

  • go through my giant data set in h5py one chunk at a time (width * small_height), convert it to PNG and save it to disk in a temporary file, or if your image conversion library allows it - directly to a bytes string in memory. Then process the byte data as follows and delete it at the end:

    -- on the first iteration: walk through the PNG data one record at a time (see the PNG spec: http://www.libpng.org/pub/png/spec/1.2/png-1.2-pdg.html, it is in length-tag-value form and very easy to write code that efficiently walks over the file record by record), save ALL the records into my target file, except: modify IHDR to have the final image size and skip the IEND record.

    -- on all subsequent iterations: scan through the PNG data and pick only the IDAT records, write those out to the output file.

  • append an IEND record to the target file.

All done - you should now have a valid humongous PNG. I wonder who or what could read that, though.

like image 21
Leo K Avatar answered Sep 27 '22 19:09

Leo K