Can I use ghostscript API to convert PDF to some other format without reading data from disk or writing results to disk? It has a big overhead!
I need something like this:
public static byte[][] ConvertPDF(byte[] pdfData)
{
//// Returns an array of byte-array of pages data
}
Since there still isn't a correct answer here all these years later, I'll provide one.
Ghostscipt performs its operations on disk. It doesn't use an input & output path merely to load the file into memory, perform operations, and write it back. It actually reads and writes parts of the file to disk as it goes (using multiple threads). While this IS slower, it also uses much less memory(bearing in mind that these files could potentially be quite large).
Because the operations are performed on disk, there was not (at the time of this question) any way to pass in or retrieve a byte array/memory stream because to do so would be "dishonest"--it might imply that it was a "shortcut" to prevent disk IO when in fact it would not. Later, support was added to accept & return memory streams, but it's important to note that this support merely accepted the memory stream, wrote it to a temporary file, performed the operations, and then read it back to a new memory stream.
If that still meets your needs (for example, if you want the inevitable IO to be handled by the library rather than your business logic), here are a couple links demonstrating how to go about it (your exact needs do change the mechanics).
Image to pdf (memory stream to memory stream via rasterizer)
Image to pdf (file to memory stream via processor)
Pdf to image (memory stream to memory stream via rasterizer)
Hopefully these will, collectively, provide enough information to solve this issue for others who, like me & OP, mostly found people saying it was impossible and that I shouldn't even be trying.
Using the Ghostscript API you can send input from anywhere you like. Depending on the output device you choose you may be able to send the output to stdout, or to retrieve a bitmap in memory.
If you want TIFF output then you have to have an output file (Tagged Image File Format, the clue is in the name...)
Similarly, you can't do this with PDF files as input, those have to be available as a file, because PDF is a random access format.
What leads you to think that this is a performance problem ?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With