Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove illegal characters from string with PDFBox

Tags:

java

pdfbox

When I try to write illegal characters to a PDF I obviously get an exception. E.g.

contentStream.showText("some illegal characters");    
...
java.lang.IllegalArgumentException: U+000A ('controlLF') is not available in this font Helvetica (generic: ArialMT) encoding: WinAnsiEncoding...

How can I find out which characters are not supported and strip them from the string?

like image 462
ave4496 Avatar asked Feb 14 '17 14:02

ave4496


1 Answers

Here is my solution... at least it works for what I need. I used the WinAnsiEncoding class of PDFBox and called the contains method to check if the character is supported.

import org.apache.pdfbox.pdmodel.font.encoding.WinAnsiEncoding;

public class Test extends WinAnsiEncoding {

    public static String remove(String test) {
        StringBuilder b = new StringBuilder();
        for (int i = 0; i < test.length(); i++) {
            if (WinAnsiEncoding.INSTANCE.contains(test.charAt(i))) {
                b.append(test.charAt(i));
            }
        }
        return b.toString();
    }

    public static void main(String[] args) {
        System.out.println(remove("abc\rcde"));
        // prints abccde
    }

}
like image 56
ave4496 Avatar answered Nov 03 '22 21:11

ave4496