I'm playing around with the ability to create pdf files through C# code. I have been looking at the PDF specifications and have been able to create a working PDF file, done by taking strings of data and encoding them into byte arrays using the UTF8 Encoding.
The problem I run into is when I try to use the DeflateStream
on the pdf stream objects. It just doesn't seem to work:
Here is the text version of the pdf object that is in question (\r\n is at the end of each line, just not visible here):
5 0 obj
<</Length 45>>
stream
BT 70 50 TD /F1 12 Tf (Hello, world!) Tj ET
endstream
endobj
When I attempt to use the DeflateStream
class to compress the line BT 70 50 TD /F1 12 Tf (Hello, world!) Tj ET
, the pdf seems to not work. I noticed that a lot of other libraries such as iTextSharp use their own implementation of the Deflate compression.
Is there any reason why Microsoft's implementation of the DeflateStream class isn't working? Am I using it incorrectly or is it implemented incorrectly or what?
I know that PDF files are binary (not text), but if I'm not encrypting anything then it is possible to view it all as text. Here is the entire PDF file for reference (in plaintext, also \r\n is at the end of each line, just not visible here):
%PDF-1.7
1 0 obj
<</Type /Catalog /Pages 2 0 R>>
endobj
2 0 obj
<</Type /Pages /MediaBox [ 0 0 200 200 ] /Count 1 /Kids [ 3 0 R ]>>
endobj
3 0 obj
<</Type /Page /Parent 2 0 R /Resources <</Font <</F1 4 0 R>>>> /Contents 5 0 R>>
endobj
4 0 obj
<</Type /Font /Subtype /Type1 /BaseFont /Times-Roman>>
endobj
5 0 obj
<</Length 45>>
stream
BT 70 50 TD /F1 12 Tf (Hello, world!) Tj ET
endstream
endobj
xref
0 6
0000000000 65535 f
0000000017 00000 n
0000000067 00000 n
0000000153 00000 n
0000000252 00000 n
0000000325 00000 n
trailer
<</Size 6/Root 1 0 R>>
startxref
422
%%EOF
Is there any reason why Microsoft's implementation of the DeflateStream class isn't working? Am I using it incorrectly or is it implemented incorrectly or what?
DeflateStream
is actually implementing RFC 1951 (DEFLATE), where PDF is compressed using a compression method compatible with RFC 1950. This is detailed, with a workaround, in this related Microsoft Connect bug report.
A simple workaround would be to use a third party compression library, such as DotNetZip, which will support the proper format. That being said, the Connect report suggests that skipping the first two bytes may cause this to work in most cases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With