Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tika solr integration

I am trying to index using curl based request

the request is

curl "http://localhost:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&fmap.content=attr_content&commit=true" -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"

On submitting the request, i am getting this error,

 Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 400 - ERROR:unknown field 'ignored_meta'</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>ERROR:unknown field 'ignored_meta'</u></p><p><b>description</b> <u>The request sent by the client was syntactically incorrect (ERROR:unknown field 'ignored_meta').</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.18</h3></body></html>r
like image 407
naveen gupta Avatar asked May 31 '11 11:05

naveen gupta


People also ask

What does Tika Parser do?

tika. parser. Parser interface is the key concept of Apache Tika. It hides the complexity of different file formats and parsing libraries while providing a simple and powerful mechanism for client applications to extract structured text content and metadata from all sorts of documents.

How do I index a PDF in Solr?

Using Solr with Drupal 8 on PantheonAdd the Search API Pantheon module as a required dependency. Commit the modules to the server. Add the Search Server. Create your search index.


1 Answers

Your problem is due to the fact that the default handler for ExtractingRequestHandler defined in the solrconfig.xml put all the Tika's not identified extracted fields into fields named 'ingored_XXXXX'.

To solve this, you can simply add to your Solr configuration a field name 'ignored_*' like this:

<dynamicField name="ignored_*" type="ignored"/>

Don't forget to add also the ignored type if you remove it from the default configuration:

<fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />

This will stop your Solr from crashing when Tika index fields that Solr don't know of.

like image 170
elwood Avatar answered Sep 24 '22 23:09

elwood