Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ideal way of dealing with Solr results in PHP?

Firslty, I'm aware of some similar questions along the lines of this one, but I think this situation is different enough to warrant its own question.

I'm running a Solr index, through a jetty install on a LAMP server. I currently use the simplexml_load_file function to bring in the search results and then parse them trough a couple of functions. I was happy with this process until I started running into a fundamental problem.

The field names don't get passed through the simplexml function. For example, this result;

<doc>
  <float name="score">0.73325396</float>
  <str name="add1">Ravensbridge Drive</str>
  <str name="comments">0</str>
  <str name="company">Stratstone Lotus Leicester</str>
  <str name="feed_id"/>
  <str name="id">1711765</str>
  <str name="pcode">LE4 0BX</str>
  <str name="psearch">LE4</str>
  <str name="rating">0</str>
</doc>

Will look like this in the simplexml object;

 [doc] => Array
 (
   [0] => SimpleXMLElement Object
   (
     [float] => 0.73325396
     [str] => Array
     (
       [0] => Ravensbridge Drive
       [1] => 0
       [2] => Stratstone Lotus Leicester
       [3] => SimpleXMLElement Object
       (
         [@attributes] => Array
         (
           [name] => feed_id
         )
       )
       [4] => 1711765
       [5] => LE4 0BX
       [6] => LE4
       [7] => 0
     )
   )

When a full dataset is found, there is 11 bits of data stored in the array, but when some are missing, data moves around and my parser comes unstuck.

So, I've looked at libraries/classes to do it properly. Namely, the two main ones; Apache Solr and solr-php-client but both seem over complicated, with little amount of actual real world examples, and neither look like they support different solr cores, of which I use several.

Whats the best thing to use? I've got pretty stuck here now, any help would be MASSIVELY appreciated.

Thanks!

like image 979
Tom Avatar asked Jun 04 '10 11:06

Tom


People also ask

How much data can Solr handle?

For really large data sets, you'll need to watch out for the hard limit of two billion documents per Solr core.

How many documents can Solr handle?

You can use the rows parameter to paginate results from a query. The parameter specifies the maximum number of documents from the complete result set that Solr should return to the client at one time. The default value is 10. That is, by default, Solr returns 10 documents at a time in response to a query.

What is Solr PHP?

solr-php-clientA 3rd party PHP library for indexing and searching documents within an Apache Solr installation. Zip / Tarballs can be found at SolrPhpClient. Adding, Deleting (by id and query), committing, optimizing and of course searching against a Solr instance.

How does Solr work?

Solr works by gathering, storing and indexing documents from different sources and making them searchable in near real-time. It follows a 3-step process that involves indexing, querying, and finally, ranking the results – all in near real-time, even though it can work with huge volumes of data.


1 Answers

Definitely, use one of the existing clients. As for the multiple core support, it's as simple as creating an instance of the client for each instance of Solr.

The Solr extension is much more powerful while still quite intuitive to use. Here there are a couple of sample code snippets that make a basic query and get the results back using both libraries:

PHP Solr extension

<?php
$options = array
(
    'hostname' => 'localhost',
    'port'     => '8080',
    'path'     => '/solr'
);

$client = new SolrClient($options);

$query = new SolrQuery();
$query->setQuery('fox');
$query->setStart(0);
$query->setRows(50);
// specify which fields do we want to retrieve
$query->addField('id')->addField('title_t')->addField('source_t');

$res = $client->query($query)->getResponse();

// how does he response look like?
var_dump($res);
/*
object(SolrObject)[4]
  public 'responseHeader' => 
    object(SolrObject)[5]
      public 'status' => int 0
      public 'QTime' => int 0
      public 'params' => 
        object(SolrObject)[6]
          public 'fl' => string 'id,title_t,source_t' (length=19)
          public 'indent' => string 'on' (length=2)
          public 'start' => string '0' (length=1)
          public 'q' => string 'fox' (length=3)
          public 'wt' => string 'xml' (length=3)
          public 'rows' => string '50' (length=2)
          public 'version' => string '2.2' (length=3)
  public 'response' => 
    object(SolrObject)[7]
      public 'numFound' => int 39
      public 'start' => int 0
      public 'docs' => 
        array
          0 => 
            object(SolrObject)[8]
              ...
          1 => 
            object(SolrObject)[9]
              ...
          2 => 
            object(SolrObject)[10]
              ...
          (...)
*/
// how does a document look like?
var_dump($res->reponse->docs[0]);
/*
object(SolrObject)[8]
  public 'id' => int 11408
  public 'source_t' => string 'CBD News Headlines' (length=18)
  public 'title_t' => string 'Hunting across Southeast Asia weakens forests' survival' (length=55)
*/

solr-php-client (official example of use)

require_once 'library/SolrPhpClient/Apache/Solr/Service.php';

$solr = new Apache_Solr_Service('localhost', '8080', '/solr');

if (!$solr->ping()) {
    exit('Solr service not responding.');
}

$offset = 0;
$limit = 50;

$query = 'fox';
$res = $solr->search($query, $offset, $limit);

// how does he response look like?
var_dump($res->response);

/*
object(stdClass)[6]
  public 'numFound' => int 39
  public 'start' => int 0
  public 'docs' => 
    array
      0 => 
        object(Apache_Solr_Document)[46]
          protected '_documentBoost' => boolean false
          protected '_fields' => 
            array
              ...
          protected '_fieldBoosts' => 
            array
              ...
      1 => 
        object(Apache_Solr_Document)[47]
          protected '_documentBoost' => boolean false
          protected '_fields' => 
            array
              ...
          protected '_fieldBoosts' => 
            array
              ...
     (...)
*/

// how does a document look like?
var_dump($res->response->doc[0]);

/*
object(Apache_Solr_Document)[46]
  protected '_documentBoost' => boolean false
  protected '_fields' => 
    array
      'publicationTime_i' => int 1257724800
      'publicationDate_t' => string 'Mon, 9 Nov 2009' (length=15)
      'url_s' => string 'http://news.mongabay.com/2009/1108-hance_corlett.html' (length=53)
      'language_s' => string 'EN' (length=2)
      'title_t' => string 'Hunting across Southeast Asia weakens forests' survival' (length=55)
      'text' => string 'A large flying fox eats a fruit ingesting its seeds.' (length=52)
      'id' => int 11408
      'relevance_i' => int 27
      'source_t' => string 'CBD News Headlines' (length=18)
  protected '_fieldBoosts' => 
    array
      'publicationTime_i' => boolean false
      'publicationDate_t' => boolean false
      'url_s' => boolean false
      'language_s' => boolean false
      'title_t' => boolean false
      'text' => boolean false
      'id' => boolean false
      'relevance_i' => boolean false
      'source_t' => boolean false
*/
like image 123
nuqqsa Avatar answered Oct 17 '22 15:10

nuqqsa