Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use tika with python, runtimeerror: unable to start tika server

Tags:

I am trying to use the tika package to Parse files. Tika is successfully installed, tika-server-1.18.jar runned with Code in cmd Java -jar tika-server-1.18.jar

My code in the Jupyter is:

Import tika 
from tika Import parser
parsed = parser.from_file('')

However, I receive below error:

2018-07-25 10:20:13,325 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:18,329 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:23,332 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:28,340 [MainThread ] [ERROR] Tika startup log message not received after 3 tries. 2018-07-25 10:20:28,340 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer.

RuntimeError: Unable to start Tika Server.

like image 772
Sha Li Avatar asked Jul 25 '18 08:07

Sha Li


People also ask

How do I start a tika server?

- GUI mode Use the "--gui" (or "-g") option to start the Apache Tika GUI. You can drag and drop files from a normal file explorer to the GUI window to extract text content and metadata from the files. - Server mode Use the "--server" (or "-s") option to start the Apache Tika server.

How do you use Tika in Python?

Tika-Python is Python binding to the Apache TikaTM REST services allowing tika to be called natively in python language. Installation: To install Tika type the below command in the terminal. For extracting contents from the PDF files we will use from_file() method of parser object.


2 Answers

According to Apache Tika's site, all new versions of the tika-server.jar will require Java 8.

24 April 2018: Apache Tika Release Apache Tika 1.18 has been released! This release includes bug fixes (e.g. extraction from grouped shapes in PPT), security fixes and upgrades to dependencies. PLEASE NOTE: The next versions will require Java 8. Please see the CHANGES.txt file for the full list of changes in the release and have a look at the download page for more information on how to obtain Apache Tika 1.18.

Current outdated docs for tika Python library claim that Java 7 is needed, but now Java 8 must be installed. This is because the current version of tika-server.jar is automatically downloaded at runtime if not found in your temp file.

After installing Java 8, my basic test code launched the server and worked without error.

like image 82
autry.richard Avatar answered Oct 16 '22 18:10

autry.richard


After you import Tika you need to initialize the Java Server

import tika
tika.initVM()
from tika import parser
parsed = parser.from_file('') //file name should be here
like image 21
A. Pond Avatar answered Oct 16 '22 17:10

A. Pond