Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract FlateDecoded Images from PDF with PDFSharp

Tags:

c#

pdfsharp

how do I extract Images, which are FlateDecoded (such like PNG) out of a PDF-Document with PDFSharp?

I found that comment in a Sample of PDFSharp:

// TODO: You can put the code here that converts vom PDF internal image format to a
// Windows bitmap
// and use GDI+ to save it in PNG format.
// [...]
// Take a look at the file
// PdfSharp.Pdf.Advanced/PdfImage.cs to see how we create the PDF image formats.

Does anyone have a solution for this problem?

Thanks for your replies.

EDIT: Because I'm not able to answer on my own Question within 8 hours, I do it on that way:

Thanks for your very fast reply.

I added some Code to the Method "ExportAsPngImage", but I didn't get the wanted results. It is just extracting a few more Images (png) and they don't have the right colors and are distorted.

Here's my actual Code:

PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
        byte[] decodedBytes = flate.Decode(bytes);

        System.Drawing.Imaging.PixelFormat pixelFormat;

        switch (bitsPerComponent)
        {
            case 1:
                pixelFormat = PixelFormat.Format1bppIndexed;
                break;
            case 8:
                pixelFormat = PixelFormat.Format8bppIndexed;
                break;
            case 24:
                pixelFormat = PixelFormat.Format24bppRgb;
                break;
            default:
                throw new Exception("Unknown pixel format " + bitsPerComponent);
        }

        Bitmap bmp = new Bitmap(width, height, pixelFormat);
        var bmpData = bmp.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.WriteOnly, pixelFormat);
        int length = (int)Math.Ceiling(width * bitsPerComponent / 8.0);
        for (int i = 0; i < height; i++)
        {
            int offset = i * length;
            int scanOffset = i * bmpData.Stride;
            Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
        }
        bmp.UnlockBits(bmpData);
        using (FileStream fs = new FileStream(@"C:\Export\PdfSharp\" + String.Format("Image{0}.png", count), FileMode.Create, FileAccess.Write))
        {
            bmp.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
        }

Is that the right way? Or should I choose another way? Thanks a lot!

like image 359
der_chirurg Avatar asked Apr 05 '12 08:04

der_chirurg


2 Answers

I know this answer might be a few years to late, but maybe it will help others.

The disortion occurs in my case because image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent) seems to not return the correct value. As Vive la déraison pointed out under your question, you get the BGR Format for using Marshal.Copy. So reversing the Bytes and rotating the Bitmap after executing Marshal.Copy will do the job.

The resulting code looks like this:

private static void ExportAsPngImage(PdfDictionary image, ref int count)
    {
        int width = image.Elements.GetInteger(PdfImage.Keys.Width);
        int height = image.Elements.GetInteger(PdfImage.Keys.Height);

        var canUnfilter = image.Stream.TryUnfilter();
        byte[] decodedBytes;

        if (canUnfilter)
        {
            decodedBytes = image.Stream.Value;
        }
        else
        {
            PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
            decodedBytes = flate.Decode(image.Stream.Value);
        }

        int bitsPerComponent = 0;
        while (decodedBytes.Length - ((width * height) * bitsPerComponent / 8) != 0)
        {
            bitsPerComponent++;
        }

        System.Drawing.Imaging.PixelFormat pixelFormat;
        switch (bitsPerComponent)
        {
            case 1:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;
                break;
            case 8:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;
                break;
            case 16:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format16bppArgb1555;
                break;
            case 24:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
                break;
            case 32:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format32bppArgb;
                break;
            case 64:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format64bppArgb;
                break;
            default:
                throw new Exception("Unknown pixel format " + bitsPerComponent);
        }

        decodedBytes = decodedBytes.Reverse().ToArray();

        Bitmap bmp = new Bitmap(width, height, pixelFormat);
        BitmapData bmpData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.WriteOnly, bmp.PixelFormat);
        int length = (int)Math.Ceiling(width * (bitsPerComponent / 8.0));
        for (int i = 0; i < height; i++)
        {
            int offset = i * length;
            int scanOffset = i * bmpData.Stride;
            Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
        }
        bmp.UnlockBits(bmpData);
        bmp.RotateFlip(RotateFlipType.Rotate180FlipNone);
        bmp.Save(String.Format("exported_Images\\Image{0}.png", count++), System.Drawing.Imaging.ImageFormat.Png);
    }

The code might need some optimisation, but it did export FlateDecoded Images correctly in my case.

like image 66
New Tartarus Avatar answered Sep 28 '22 15:09

New Tartarus


To get a Windows BMP, you just have to create a Bitmap header and then copy the image data into the bitmap. PDF images are byte aligned (every new line starts on a byte boundary) while Windows BMPs are DWORD aligned (every new line starts on a DWORD boundary (a DWORD is 4 bytes for historical reasons)). All information you need for the Bitmap header can be found in the filter parameters or can be calculated.

The color palette is another FlateEncoded object in the PDF. You also copy that into the BMP.

This must be done for several formats (1 bit per pixel, 8 bpp, 24 bpp, 32 bpp).

like image 21
I liked the old Stack Overflow Avatar answered Sep 28 '22 16:09

I liked the old Stack Overflow