I have python code using tabula-py for reading PDF to extract the text and then change it to tabular form via tabula-py. But it gives me a warning.
Nov 15, 2017 3:40:23 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for .notdef (9) in font Helvetica
This warning is of tabula-py, And Tabula-py is written in Java. So I cannot simply use -W ignore
to suppress the above warning.
Is there any way to remove or suppress the above warning.
What is Tabula? Tabular is a basic wrapper of tabula-java that allows users to the extraction of the table and converts the PDF file directly into Data frames or JSON using Python Programming language. The user can also extract tables from PDF and convert them into TSV, CSV, or JSON format files.
Installation. Before installing tabula-py, ensure you have Java runtime on your environment. You can install tabula-py form PyPI with pip command.
The naming for parsing methods inside Camelot (i.e. Lattice and Stream) was inspired from Tabula. Lattice is used to parse tables that have demarcated lines between cells, while Stream is used to parse tables that have whitespaces between cells to simulate a table structure.
tabula-py author is here. Setting silent=True
suppresses the tabula-java logs.
see also:
https://github.com/chezou/tabula-py/blob/e11d6f0ac518810b6d92b60a815e34f32f6bf085/tabula/io.py#L65
https://tabula-py.readthedocs.io/en/latest/tabula.html#tabula.io.build_options
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With