Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you extract color profiles from a PDF file using pdfbox (or other open source Java lib)

Tags:

java

pdf

pdfbox

Once you've loaded a document:

public static void main(String[] args) throws IOException {
    PDDocument doc = PDDocument.load(new File("blah.pdf"));

How do you get the page by page printing color intent from the PDDocument? I read the docs, didn't see coverage.

like image 228
stevedbrown Avatar asked Nov 10 '16 07:11

stevedbrown


2 Answers

This gets the output intents (you'll get these with high quality PDF files) and also the icc profiles for colorspaces and images:

    PDDocument doc = PDDocument.load(new File("XXXXX.pdf"));
    for (PDOutputIntent oi : doc.getDocumentCatalog().getOutputIntents())
    {
        COSStream destOutputIntent = oi.getDestOutputIntent();
        String info = oi.getOutputCondition();
        if (info == null || info.isEmpty())
        {
            info = oi.getInfo();
        }
        InputStream is = destOutputIntent.createInputStream();
        FileOutputStream fos = new FileOutputStream(info + ".icc");
        IOUtils.copy(is, fos);
        fos.close();
        is.close();
    }
    for (int p = 0; p < doc.getNumberOfPages(); ++p)
    {
        PDPage page = doc.getPage(p);
        for (COSName name : page.getResources().getColorSpaceNames())
        {
            PDColorSpace cs = page.getResources().getColorSpace(name);
            if (cs instanceof PDICCBased)
            {
                PDICCBased iccCS = (PDICCBased) cs;
                InputStream is = iccCS.getPDStream().createInputStream();
                FileOutputStream fos = new FileOutputStream(System.currentTimeMillis() + ".icc");
                IOUtils.copy(is, fos);
                fos.close();
                is.close();
            }
        }
        for (COSName name : page.getResources().getXObjectNames())
        {
            PDXObject x = page.getResources().getXObject(name);
            if (x instanceof PDImageXObject)
            {
                PDImageXObject img = (PDImageXObject) x;
                if (img.getColorSpace() instanceof PDICCBased)
                {
                    InputStream is = ((PDICCBased) img.getColorSpace()).getPDStream().createInputStream();
                    FileOutputStream fos = new FileOutputStream(System.currentTimeMillis() + ".icc");
                    IOUtils.copy(is, fos);
                    fos.close();
                    is.close();
                }
            }
        }
    }
    doc.close();

What this doesn't do (but I could add some of it if needed):

  • colorspaces of shadings, patterns, xobject forms, appearance stream resources
  • recursion in colorspaces like DeviceN and Separation
  • recursion in patterns, xobject forms, soft masks
like image 70
Tilman Hausherr Avatar answered Oct 11 '22 11:10

Tilman Hausherr


I read the examples on "How to create/add Intents to a PDF file". I couldn't get an example on "How to get intents". Using the API/examples, I wrote the following (untested code) to get the COSStream object for each of the Intents. See if this is useful for you.

public static void main(String[] args) throws IOException {
  PDDocument doc = PDDocument.load(new File("blah.pdf"));

  PDDocumentCatalog cat = doc.getDocumentCatalog();
  List<PDOutputIntent> list = cat.getOutputIntents();

  for (PDOutputIntent e : list) {
    p("PDOutputIntent Found:");
    p("Info="+e.getInfo());
    p("OutputCondition="+e.getOutputCondition());
    p("OutputConditionIdentifier="+e.getOutputConditionIdentifier());
    p("RegistryName="+e.getRegistryName());
    COSStream cstr = e.getDestOutputIntent();
  }

  static void p(String s) {
    System.out.println(s);
  }
}
like image 40
blackpen Avatar answered Oct 11 '22 11:10

blackpen