I've tried to follow the nutch tutorial but having a bit of a problem with the schema.xml file.
I was told to the nutch provided schema to my project, essentially this...
cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml ${APACHE_SOLR_HOME}/example/solr/conf/
I have deployed my solr file in Tomcat and the error I get when I go to the Solr dashboard is
collection1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Plugin init failure for [schema.xml] fieldType "text":
Plugin init failure for [schema.xml] analyzer/filter:
Error loading class 'solr.EnglishPorterFilterFactory'
Which relates to this element in my solrconfig.xml file (I can comment this out but not sure how important this is yet)
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
I have edited my solrconfig.xml to try and included a range of jar files that come with solr, specifically
<lib path="/etc/solr/collection1/libs/dist/solr-core-4.2.1.jar" />
<lib path="/etc/solr/collection1/libs/dist/solr-analysis-extras-4.2.1.jar" />
But I don't think they contain the missing class "solr.EnglishPorterFilterFactory"
Does anyone have idea why this might not be working or if I have missed something? I'm not a Java developer btw so no doubt it will be something simple :)
UPDATE After finding out that the schema had some old classes being referenced I had another look in the nutch/conf and tt looks like there is a ${NUTCH_RUNTIME_HOME}/conf/schema-solr4.xml file which seems to work.
Not 100% if this is correct but hey...
Looks like EnglishPorterFilterFactory
is no longer around in 4.x. See the note in it's 3.6.0 documentation:
Deprecated.
Use SnowballPorterFilterFactory with language="English" instead
A lot of Deprecated stuff went away in 4.0. I'd do what it says, see the documentation for SnowballPorterFilterFactory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With