java -jar tika-app-1.10-SNAPSHOT.jar -m manu.pdf > output.txt
which successfully creates the text I need in the output file.What is the best way to call Tika from PHP in order to get the plain text of an uploaded file into PHP?
Searching around I find:
exec
command.But I'm not sure what is the easiest way to proceed.
For running on a remote server I suggest you to use curl
or Guzzle
to call the address (but you could also simply use file_get_contents
and pass it the URL for the API that will call Tika on the remote server.
For running the parsing on local (Tika and PHP on same server) I used Synfony/Process.
I'd, personally, discourage you from just using exec
.
I would add that having Tika on another server will force you to send this server the whole file payload uploaded from the user. While a faster solution would be to just receive the upload, with PHP execution, and directly call the Tika process from the same script (or at least from the same machine). Otherwise you need a script that:
As I highlighed there will be a lot more overhead just as communication between the two servers; and that is not desirable when the file to parse is maybe a 35MB pdf-file, is it? The user would have to wait, let's say, 2 minutes for the upload, PLUS other, let's say, 20 seconds to send the file to the Tika server, and then other, let's say 3 seconds to get the text-format parsed result.
I strongly suggest to stay and work on the same PHP server.
If it is on your own managed servers, and both PHP and Tika locations are known to you, just use exec
.
Or if you prefer better control (which I suspect you do not need) use shell_exec
If you have some performance issues, and/or need to scale this thing, then there is room for a more elaborate solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With