Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding metadata in EPS file using java

I'm currently reading and writing .EPS file to manipulate/add metadata (Keywords and Tags) in the file.

PS: File encoding is Windows-1251 or Cp1251 -Russian-

I'm reading EPS file like this: (String lines; is a global variable)

try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file), "Cp1251"))) {
    String line;
    while((line = br.readLine()) != null) {
        if(line.contains("</xmpTPg:SwatchGroups>")) {
            lines.add(line);
            lines.add(descriptionKwrds);
        }
        else
            lines.add(line);
        System.out.println(line);
    }
} catch (FileNotFoundException ex) {
    Logger.getLogger(script.class.getName()).log(Level.SEVERE, null, ex);
} catch (UnsupportedEncodingException ex) {
    Logger.getLogger(script.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
    Logger.getLogger(script.class.getName()).log(Level.SEVERE, null, ex);
}

In above descriptionKwrds is the metadata (tags) that I want to manipulate an EPS file like:

String descriptionKwrds = "<photoshop:AuthorsPosition>icon vector illustration symbol bubble sign</photoshop:AuthorsPosition>";

And writing EPS file like this:

try {
    try (BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file.getName()), "Cp1251"))) {
        for(String s : lines)
            out.write(s + "\n");
        out.flush();
    }
} catch (FileNotFoundException ex) {
    Logger.getLogger(script.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
    Logger.getLogger(script.class.getName()).log(Level.SEVERE, null, ex);
}

File is reading and writing correctly, but when I open newly generated file. It says that the file is corrupted.

Files before and after manipulation are file1 and file2 respectively. And using ESP Converter to open EPS files online.

How I can achieve it?

like image 971
youpilat13 Avatar asked Jun 11 '19 06:06

youpilat13


1 Answers

OK your problem is that your EPS file is an 'EPS with preview'. In addition to the actual PostScript program, there is a bitmap which any application placing the EPS on a page can use to disply a 'preview' to the user.

The file has binary at the beginning of it like this:

C5 D0 D3 C6 20 00 00 00 DC 49 05 00 00 00 00 00
00 00 00 00 FC 49 05 00 AE AC 4D 00 FF FF 00 00

If you read Adobe Technical Note 5002 "Encapsulated PostScript File Format Specification" and look at page 23 you will see that it defines the DOS EPS Binary File Header, which begins hex C5D0D3C6, just as your file does. So you can see your file has a DOS header, which defines a preview.

Now byes 4-7 define the start of the PostScript, and bytes 8-11 define the length of the PostScript section. 12-15 are the start of the Metafile (0 for your case, so not present) and 16-19 are the byte length, again 0. Then at bytes 20-23 there is the start of the TIFF representation, and bytes 24-27 are the length of the TIFF. Finally there's the checksum of the header in the remaining two bytes; here we have 0xFFFF which means 'ignore the checksum'. In this case the header has been padded out with two bytes (0x00) to make the total 32 bytes which is why the offset of the PostScript section is 0x20.

Your problem is that, because you have added content to the PostScript section (therefore increasing its size), but have not updated the file header, to contain the new length of the PostScript section, or the new position of the preview, any EPS consumer won't be able to strip the preview. In effect you have corrupted the PostScript program.

You either need to update the file header, or strip the preview bitmap by removing the file header and trimming the bitmap off the end to produce a 'pure' EPS file (ie one with no preview).

I almost forgot to add some clarification; you are not updating 'keywords' or 'tags' in the EPS file. You are adding PostScript-language program code which executes PostScript operators. In this case, when run through a 'Disitller'-like PostScript interpreter (that is, one which produces PDF as an output), the PDF file will have its metadata altered. You aren't altering the metadata of the EPS at all (that's done with the comments in the header). For a PostScript consumer which is not a Distiller the changes you have made will have no effect at all.

[Update]

Modifying the header of 'file2' (that is the file which has had pdfmarks added) like this:

C5 D0 D3 C6 20 00 00 00 32 26 05 00 00 00 00 00
00 00 00 00 52 26 05 00 AE AC 4D 00 FF FF 00 00

Results in a working file. It seems that the modifications actually made the file shorter. The original size of the PostScript section was 0x0549DC and the offset of the TIFF bitmap was 0x0549FC. After modification the size of the PostScript section is 0x052632 and the offset of the TIFF bitmap is 0x052652.

I have a sneaking suspicion that this is due to CR/LF translation, and if so this will also have corrutped the TIFF bitmap stored at the end of the file (I notice the binary at the end does indeed appear to be different). You need to read and write this file as a binary file, not text.

like image 182
KenS Avatar answered Nov 11 '22 01:11

KenS