I'm trying to read the contents of a PDF using Apache's PDFBox and encode it in base64 so I can stream it to elsewhere. To encode it I use the Apache commons Base64OutputStream class. Like so,
ByteArrayOutputStream byteOutput = new ByteArrayOutputStream();
Base64OutputStream base64Output = new Base64OutputStream(byteOutput);
List pages = pdfDocument.getDocumentCatalog().getAllPages();
Iterator iter = pages.iterator();
while (iter.hasNext()) {
PDPage page = (PDPage) iter.next();
PDResources resources = page.getResources();
Map<String, PDXObjectImage> pageImages = resources.getImages();
if (pageImages != null) {
Iterator imageIter = pageImages.keySet().iterator();
while (imageIter.hasNext()) {
String key = (String) imageIter.next();
PDXObjectImage image = (PDXObjectImage) pageImages
.get(key);
image.write2OutputStream(base64Output);
}
}
}
String base64 = new String(byteOutput.toByteArray());
It seems to be encoding it but I need to verify it by writing a junit test to validate the base64 string. The following doesnt seem to pass it. Any thoughts ?
assertTrue(content
.matches("^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$"));
Thanks in advance
By default Base64OutputStream
is using CHUNK_SIZE = 76, and CHUNK_SEPARATOR = {'\r', '\n'}.
Regular expression you are using to test if given string is a BASE64 encoded doesn't account for that.
Regular expression to match chunked BASE64 (with given chunk size 64 and separator \r\n) string could look like this:
"^(([\\w+/]{4}){19}\r\n)*(([\\w+/]{4})*([\\w+/]{4}|[\\w+/]{3}=|[\\w+/]{2}==))$"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With