I need to verify that text extraction is working on my Solr installation on Windows Server 2003. All the examples I found for uploading files to Solr use curl like below.
curl "http://localhost:8983/solr/update/extract?&extractOnly=true" --data-binary @tutorial.html -H 'Content-type:text/html'
How can I do this in Windows? I want to test upload a pdf and Word document then confirm I can search for words contained in the document using Solr admin page.
Importing the DataGo to browser and open http://localhost:8983/solr to access Solr admin. Choose your Core as shown below. You should now see a new menu. Choose Data Import from the menu and you should see a view as shown below.
If you are running Windows, you can start Solr by running bin\solr. cmd instead. This will start Solr in the background, listening on port 8983. When you start Solr in the background, the script will wait to make sure Solr starts correctly before returning to the command line prompt.
Solr includes a simple command line tool for POSTing various types of content to a Solr server. The tool is bin/post . The bin/post tool is a Unix shell script; for Windows (non-Cygwin) usage, see the Windows section below.
With the examples comes a post.jar
(see folder example\exampledocs
of the apache-solr-X.X.X.zip
):
java -jar post.jar -h
This is a simple command line tool for POSTing raw data to a Solr
port. Data can be read from files specified as commandline args,
as raw commandline arg strings, or via STDIN.
Examples:
java -jar post.jar *.xml
java -Ddata=args -jar post.jar '<delete><id>42</id></delete>'
java -Ddata=stdin -jar post.jar < hd.xml
java -Durl=http://localhost:8983/solr/update/csv -Dtype=text/csv -jar post.jar *.csv
java -Durl=http://localhost:8983/solr/update/json -Dtype=application/json -jar post.jar *.json
java -Durl=http://localhost:8983/solr/update/extract?literal.id=a -Dtype=application/pdf -jar post.jar a.pdf
Other options controlled by System Properties include the Solr
URL to POST to, the Content-Type of the data, whether a commit
or optimize should be executed, and whether the response should
be written to STDOUT. These are the defaults for all System Properties:
-Ddata=files
-Dtype=application/xml
-Durl=http://localhost:8983/solr/update
-Dcommit=yes
-Doptimize=no
-Dout=no
OR
The Windows PowerShell 3.0 has an Invoke-WebRequest
command which for sure could be used for that. See this blog post.
With solr 5.0 you have to mention core name while updating the docs. So the command to post all the examples in the exampledocs will be:
java -Dc="core_name" -jar post.jar *.xml
here replace core_name with the name of the core
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With