Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python-based document metadata parser?

Tags:

python

parsing

Does anyone know a good parser for document metadata in python for unix like systems. In Java, apache tika is great.

No com ... please :)

Thanks

like image 845
jassinm Avatar asked Dec 14 '25 12:12

jassinm


2 Answers

You don't have to use Jython to use Tika. You can call Java from Python using JCC. You can find decent instructions for this here.

When installing JCC you'll have to use one of two provided patches for setuptools, so it can build shared objects. The c7 version worked for me on Ubuntu 10.04.

Another option would be to use the python subprocess module to call and capture the stdout of Tika.

like image 121
Kevin Avatar answered Dec 16 '25 08:12

Kevin


If you like tika, you could always use Jython so you can reference tika directly.

like image 33
Hank Gay Avatar answered Dec 16 '25 08:12

Hank Gay



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!