Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Basic UIMA with SOLR

Tags:

solr

lucene

uima

I am trying to connect UIMA with Solr. I have downloaded the Solr 3.5 dist and have it successfully running with nutch and tika on windows 7 using solrcell and curl via cygwin. To begin, I copied the 6 jars from solr/contrib/uima/lib to the working /lib in solr. Next, I read the readme.txt file in solr/contrib/uima/lib and edited both my solrconfig.xml and schema.xml to no avail. I then found this link which seemed a bit more applicable since I didnt care to use Alchemy or OpenCalais: http://code.google.com/a/apache-extras.org/p/rondhuit-uima/?redir=1 Still- when I run a curl command that imports a pdf via solrcell I do not get the additional UIMA fields nor do I get anything on my logs. The test.pdf is parsed though and I see the pdf in Solr using:

curl 'http://localhost:8080/solr/update/extract?fmap.content=content&literal.id=doc1&commit=true' -F "[email protected]"

SolrConfig.XML

<updateRequestProcessorChain name="uima">
  <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
    <lst name="uimaConfig">
      <lst name="runtimeParameters">
        <str name="host">http://localhost</str>
        <str name="port">8080</str>
      </lst>
      <str name="analysisEngine">C:\uima\desc\com\rondhuit\uima\desc\NextAnnotatorDescriptor.xml</str>
      <bool name="ignoreErrors">true</bool>
      <str name="logField">id</str>
      <lst name="analyzeFields">
        <bool name="merge">false</bool>
        <arr name="fields">
          <str>content</str>
        </arr>
      </lst>
      <lst name="fieldMappings">
        <lst name="type">
          <str name="name">com.rondhuit.uima.next.NamedEntity</str>
          <lst name="mapping">
            <str name="feature">entity</str>
            <str name="fieldNameFeature">uname</str>
            <str name="dynamicField">*_sm</str>
          </lst>
        </lst>
      </lst>
    </lst>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

<requestHandler name="/update/uima" class="solr.XmlUpdateRequestHandler">
  <lst name="defaults">
    <str name="update.chain">uima</str>
  </lst>
</requestHandler>

AND I ALSO ADJUSTED MY requestHander:

<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
    <lst name="defaults">
      <str name="update.processor">uima</str>
    </lst>
  </requestHandler>

Schema.XML

<!-- fields for UIMA -->
<field name="uname" type="string" indexed="true" stored="true" multiValued="true" required="false"/>
<dynamicField name="*_sm"  type="string"  indexed="true"  stored="true"/>

All I am trying to do is have UIMA pull out names from text (just to start as a demo) and cannot figure out what I am doing wrong. Thank you in advance for reading this.

like image 344
Chris Avatar asked Apr 30 '26 02:04

Chris


1 Answers

Not sure if this ever got addressed, but in case someone else is looking, I had this same problem yesterday. Figured out that I was calling /update/extract to use solrcell, which doesn't use uima because it's integrated into /update.

like image 51
McAm Avatar answered May 02 '26 22:05

McAm