Firslty, I'm aware of some similar questions along the lines of this one, but I think this situation is different enough to warrant its own question.
I'm running a Solr index, through a jetty install on a LAMP server. I currently use the simplexml_load_file
function to bring in the search results and then parse them trough a couple of functions. I was happy with this process until I started running into a fundamental problem.
The field names don't get passed through the simplexml function. For example, this result;
<doc>
<float name="score">0.73325396</float>
<str name="add1">Ravensbridge Drive</str>
<str name="comments">0</str>
<str name="company">Stratstone Lotus Leicester</str>
<str name="feed_id"/>
<str name="id">1711765</str>
<str name="pcode">LE4 0BX</str>
<str name="psearch">LE4</str>
<str name="rating">0</str>
</doc>
Will look like this in the simplexml object;
[doc] => Array
(
[0] => SimpleXMLElement Object
(
[float] => 0.73325396
[str] => Array
(
[0] => Ravensbridge Drive
[1] => 0
[2] => Stratstone Lotus Leicester
[3] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => feed_id
)
)
[4] => 1711765
[5] => LE4 0BX
[6] => LE4
[7] => 0
)
)
When a full dataset is found, there is 11 bits of data stored in the array, but when some are missing, data moves around and my parser comes unstuck.
So, I've looked at libraries/classes to do it properly. Namely, the two main ones; Apache Solr and solr-php-client but both seem over complicated, with little amount of actual real world examples, and neither look like they support different solr cores, of which I use several.
Whats the best thing to use? I've got pretty stuck here now, any help would be MASSIVELY appreciated.
Thanks!
For really large data sets, you'll need to watch out for the hard limit of two billion documents per Solr core.
You can use the rows parameter to paginate results from a query. The parameter specifies the maximum number of documents from the complete result set that Solr should return to the client at one time. The default value is 10. That is, by default, Solr returns 10 documents at a time in response to a query.
solr-php-clientA 3rd party PHP library for indexing and searching documents within an Apache Solr installation. Zip / Tarballs can be found at SolrPhpClient. Adding, Deleting (by id and query), committing, optimizing and of course searching against a Solr instance.
Solr works by gathering, storing and indexing documents from different sources and making them searchable in near real-time. It follows a 3-step process that involves indexing, querying, and finally, ranking the results – all in near real-time, even though it can work with huge volumes of data.
Definitely, use one of the existing clients. As for the multiple core support, it's as simple as creating an instance of the client for each instance of Solr.
The Solr extension is much more powerful while still quite intuitive to use. Here there are a couple of sample code snippets that make a basic query and get the results back using both libraries:
PHP Solr extension
<?php
$options = array
(
'hostname' => 'localhost',
'port' => '8080',
'path' => '/solr'
);
$client = new SolrClient($options);
$query = new SolrQuery();
$query->setQuery('fox');
$query->setStart(0);
$query->setRows(50);
// specify which fields do we want to retrieve
$query->addField('id')->addField('title_t')->addField('source_t');
$res = $client->query($query)->getResponse();
// how does he response look like?
var_dump($res);
/*
object(SolrObject)[4]
public 'responseHeader' =>
object(SolrObject)[5]
public 'status' => int 0
public 'QTime' => int 0
public 'params' =>
object(SolrObject)[6]
public 'fl' => string 'id,title_t,source_t' (length=19)
public 'indent' => string 'on' (length=2)
public 'start' => string '0' (length=1)
public 'q' => string 'fox' (length=3)
public 'wt' => string 'xml' (length=3)
public 'rows' => string '50' (length=2)
public 'version' => string '2.2' (length=3)
public 'response' =>
object(SolrObject)[7]
public 'numFound' => int 39
public 'start' => int 0
public 'docs' =>
array
0 =>
object(SolrObject)[8]
...
1 =>
object(SolrObject)[9]
...
2 =>
object(SolrObject)[10]
...
(...)
*/
// how does a document look like?
var_dump($res->reponse->docs[0]);
/*
object(SolrObject)[8]
public 'id' => int 11408
public 'source_t' => string 'CBD News Headlines' (length=18)
public 'title_t' => string 'Hunting across Southeast Asia weakens forests' survival' (length=55)
*/
solr-php-client (official example of use)
require_once 'library/SolrPhpClient/Apache/Solr/Service.php';
$solr = new Apache_Solr_Service('localhost', '8080', '/solr');
if (!$solr->ping()) {
exit('Solr service not responding.');
}
$offset = 0;
$limit = 50;
$query = 'fox';
$res = $solr->search($query, $offset, $limit);
// how does he response look like?
var_dump($res->response);
/*
object(stdClass)[6]
public 'numFound' => int 39
public 'start' => int 0
public 'docs' =>
array
0 =>
object(Apache_Solr_Document)[46]
protected '_documentBoost' => boolean false
protected '_fields' =>
array
...
protected '_fieldBoosts' =>
array
...
1 =>
object(Apache_Solr_Document)[47]
protected '_documentBoost' => boolean false
protected '_fields' =>
array
...
protected '_fieldBoosts' =>
array
...
(...)
*/
// how does a document look like?
var_dump($res->response->doc[0]);
/*
object(Apache_Solr_Document)[46]
protected '_documentBoost' => boolean false
protected '_fields' =>
array
'publicationTime_i' => int 1257724800
'publicationDate_t' => string 'Mon, 9 Nov 2009' (length=15)
'url_s' => string 'http://news.mongabay.com/2009/1108-hance_corlett.html' (length=53)
'language_s' => string 'EN' (length=2)
'title_t' => string 'Hunting across Southeast Asia weakens forests' survival' (length=55)
'text' => string 'A large flying fox eats a fruit ingesting its seeds.' (length=52)
'id' => int 11408
'relevance_i' => int 27
'source_t' => string 'CBD News Headlines' (length=18)
protected '_fieldBoosts' =>
array
'publicationTime_i' => boolean false
'publicationDate_t' => boolean false
'url_s' => boolean false
'language_s' => boolean false
'title_t' => boolean false
'text' => boolean false
'id' => boolean false
'relevance_i' => boolean false
'source_t' => boolean false
*/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With