Tabula extract tables by area coordinates

Tags:

We are given the option to extract tables from a PDF document by specifying its coordinates. For windows users, in order to get the coordinates, you have to upload the PDF file to Tabula web page and export the script which contains the coordinates then input the coordinates into your code. For Mac users, you just have to use the Preview app and the crop inspector. I'm just wondering if there are any third party programs or plug-ins which offer this to Windows user? I think this will be handy under the following situation:

When you do not have internet access.
I think the preview app will be more accurate because I have experienced inaccurate coordinates produced from the Tabula web page.

Will be grateful if anyone can point me to where I can find such thing. Much thanks.

504

asked Aug 02 '17 09:08

Eric Choi

2 Answers

Tabula needs areas to be specified in PDF units, which are defined to be 1/72 of an inch. If using Acrobat Reader DC, you can use the Measure tool and multiply its readings by 72.

Tabula needs the area to be specified as the top, left, bottom and right distances. To obtain them, you can measure the distances from the top of the page to the beginning of the table and so on.

enter image description here

answered Sep 22 '22 16:09

Manuel Aristarán

Reader only allows measurements if the PDF creator had allowed it. Found this instead: https://graphicdesign.stackexchange.com/a/81666

Brief steps:

Download SumatraPDF. It is also available as zip, no install needed.
Open PDF with the Sumatra reader.
Press 'm' - this shows cursor position in top left corner.
Use tabula with options -p for page, -a for area. (top,left,bottom,right)

answered Sep 19 '22 16:09

Deepak Garud

Related questions
                            
                                Python: access structure field through its name in a string
                            
                                python recursive function that prints from 0 to n?
                            
                                How can I combine range() functions
                            
                                3 Different issues with ttk treeviews in python
                            
                                Custom attributes for Flask WTForms
                            
                                Python List to PostgreSQL Array
                            
                                UnboundLocalError: local variable 'L' referenced before assignment Python [duplicate]
                            
                                What is the practical application of bool() in Python?
                            
                                TypeError at / __init__() takes exactly 1 argument (2 given)
                            
                                Python ValueError: No JSON object could be decoded
                            
                                How to add a background image into pygame?
                            
                                Get last three digits of an integer
                            
                                How do I do line continuation with a long regex? [duplicate]
                            
                                matplotlib - making labels for violin plots
                            
                                Can't run pip: UnicodeDecodeError
                            
                                How to merge pandas value_counts() to dataframe or use it to subset a dataframe
                            
                                How to assign member variables temporarily?
                            
                                pandas group by ALL functionality?
                            
                                How do you decode one-hot labels in Tensorflow?
                            
                                What is itertools.groupby() used for?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tabula extract tables by area coordinates

Tags:

python

pdf

tabula

Eric Choi

People also ask

2 Answers

Manuel Aristarán

Deepak Garud

Recent Activity

Donate For Us