Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sphinx Search Engine & Python API

I am trying to use Sphinx Search Engine with their Python API. The installation went fine. But when I use their Python API I do not get the complete result set. I only get the ID's? But when I use their ./search binary in ./bin I get the entire indexed content.

When using cpp ./search binary -

./search test

1. document=1, weight=1, group_id=1, date_added=Sat Sep 11 07:42:38 2010, title=2
    id=1
    group_id=1
    group_id2=5
    date_added=2010-09-11 07:42:38
    title=test one
    content=this is my test document number one. also checking search within phrases.

But when I use the Python API, I get -

>>> import sphinxapi
>>> client = sphinxapi.SphinxClient()
>>> client.SetServer('127.0.0.1', 9312)
>>> client.Query('test')
{'status': 0, 'matches': [{'id': 1, 'weight': 1, 'attrs': {'date_added': 1284171158, 'group_id': 1, 'title': 2}}, {'id': 2, 'weight': 1, 'attrs': {'date_added': 1284171158, 'group_id': 1, 'title': 3}}, {'id': 4, 'weight': 1, 'attrs': {'date_added': 1284171158, 'group_id': 2, 'title': 1}}], 'fields': ['content'], 'time': '0.022', 'total_found': 3, 'warning': '', 'attrs': [['group_id', 1], ['date_added', 2], ['title', 3]], 'words': [{'docs': 6, 'hits': 6, 'word': 'test'}], 'error': '', 'total': 3}

How do I get the string fields like 'title' or 'content' as part of the result set?

like image 723
Srikar Appalaraju Avatar asked Sep 11 '10 20:09

Srikar Appalaraju


People also ask

How does the Sphinx search work?

Sphinx is configured to examine a data set via its Indexer. The Indexer process creates a full-text index (a special data structure that enables quick keyword searches) from the given data/text. Full-text fields are the resulting content that is indexed by Sphinx; they can be (quickly) searched for keywords.

Why use Sphinx search?

Sphinx is an open source search engine with fast full-text search capabilities. High speed of indexation, flexible search capabilities, integration with the most popular data base management systems (e.g. MySQL, PostgreSQL) and the support of various programming language APIs (e.g. for PHP, Python, Java, Perl, Ruby, .

What is Sphinx in database?

Sphinx is an open source full text search server, designed from the ground up with performance, relevance and integration simplicity in mind. It's written in C++ and works on Linux (RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD and a few other operating systems.

What is Manticore search?

Manticore Search is a low latency, high throughput full text search originally based on the Apache Sphinx project. Manticore is easily scalable and allows you to search on petabytes of data. Version.


2 Answers

Although it is possible to do, I don't think it's a good idea to store the "source" in sphinx. Sphinx is very fast for a dedicated search engine only (giving you just IDs and maybe ranking scores - if you need it).

Btw, Official SphinxSearch API is hardly updated, you can actually use MySQL driver/modul (e.g. pymysql). Following is an example:

import pymysql
db = pymysql.connect(host='127.0.0.1',port=9301,user='',passwd='',charset='utf8',db='')
cur = db.cursor()
qry='SELECT id,weight() FROM idx_name WHERE MATCH(\'"your Query"/1\') LIMIT 10 OPTION ranker=SPH04'
cur.execute(qry);row = cur.fetchall()
print(row)
cur.close();db.close()  
like image 97
taufikedys Avatar answered Sep 20 '22 18:09

taufikedys


You could use sql_field_string - add to your config

source YOUR_SOURCE
{
sql_field_string = title
sql_field_string = content

it would index data of these fields and also store these fields as string attributes so you could get them in your result set without additional SQL query.

However as all attributes string attributes always loads into memory that is why you could run out of your box memory quickly.

like image 26
tmg_tt Avatar answered Sep 16 '22 18:09

tmg_tt