Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch pagination through pyes. Offset ignored

I'm working off the pyes usage example here

I'm indexing test-index with four documents and querying later using different offsets. The start parameter doesn't change the offset for me, I keep getting the same results regardless of its value. Why is this happening?

from pyes import *
conn = ES(["localhost:9200"])
try:
    conn.delete_index('test-index') 
except:
    pass

conn.create_index('test-index')

mapping = {u'name': {'boost': 1.0,
                 'index': 'analyzed',
                 'store': 'yes',
                 'type': u'string',
                 "term_vector" : "with_positions_offsets"},
       u'title': {'boost': 1.0,
                 'index': 'analyzed',
                 'store': 'yes',
                 'type': u'string',
                 "term_vector" : "with_positions_offsets"},
       u'pos': {'store': 'yes',
                 'type': u'integer'},
       u'uuid': {'boost': 1.0,
                'index': 'not_analyzed',
                'store': 'yes',
                'type': u'string'}}

conn.put_mapping("test-type", {'properties':mapping}, ["test-index"])

conn.index({"name":"Joe Tester", "uuid":"11111", "position":1}, "test-index", "test-type", 1)
conn.index({"name":"Bill Baloney", "uuid":"22222", "position":2}, "test-index", "test-type", 2)
conn.index({"name":"Joe Joseph", "uuid":"33333", "position":3}, "test-index", "test-type", 3)
conn.index({"name":"Last Joe", "uuid":"44444", "position":4}, "test-index", "test-type", 4)

conn.refresh(["test-index"])

q = TermQuery("name", "joe")
r0 = conn.search(q, indices = ["test-index"], start=0, size=1)
r1 = conn.search(q, indices = ["test-index"], start=1, size=1)
r2 = conn.search(q, indices = ["test-index"], start=2, size=1)

print('0: {0}'.format(r0['hits']['hits']))
print('1: {0}'.format(r1['hits']['hits']))
print('2: {0}'.format(r2['hits']['hits']))

output:

$ python pagination.py 
0: [{u'_score': 0.19178301, u'_type': u'test-type', u'_id': u'4', u'_source': {u'position': 4, u'name': u'Last Joe', u'uuid': u'44444'}, u'_index': u'test-index'}]
1: [{u'_score': 0.19178301, u'_type': u'test-type', u'_id': u'4', u'_source': {u'position': 4, u'name': u'Last Joe', u'uuid': u'44444'}, u'_index': u'test-index'}]
2: [{u'_score': 0.19178301, u'_type': u'test-type', u'_id': u'4', u'_source': {u'position': 4, u'name': u'Last Joe', u'uuid': u'44444'}, u'_index': u'test-index'}]

My pyes version is 0.16.0

like image 369
AnalyticsBuilder Avatar asked Dec 27 '11 17:12

AnalyticsBuilder


2 Answers

The problem was in how the request was sent to ES, although I'm still not clear on why it failed.

Instead of sending the queries directly to ES like I did originally:

r0 = conn.search(q, indexes = ["test-index"], start=0, size=1)
r1 = conn.search(q, indexes = ["test-index"], start=1, size=1)
r2 = conn.search(q, indexes = ["test-index"], start=2, size=1)

I wrapped my queries in a pyes.query.Search object:

r0 = conn.search(Search(q, start=0, size=1), indexes = ["test-index"])
r1 = conn.search(Search(q, start=1, size=1), indexes = ["test-index"])
r2 = conn.search(Search(q, start=2, size=1), indexes = ["test-index"])

That worked, see the output below:

0s: [{u'_score': 0.19178301, u'_type': u'test-type', u'_id': u'4', u'_source': {u'position': 4, u'name': u'Last Joe', u'uuid': u'44444'}, u'_index': u'test-index'}]
1s: [{u'_score': 0.19178301, u'_type': u'test-type', u'_id': u'1', u'_source': {u'position': 1, u'name': u'Joe Tester', u'uuid': u'11111'}, u'_index': u'test-index'}]
2s: [{u'_score': 0.19178301, u'_type': u'test-type', u'_id': u'3', u'_source': {u'position': 3, u'name': u'Joe Joseph', u'uuid': u'33333'}, u'_index': u'test-index'}]
like image 197
AnalyticsBuilder Avatar answered Nov 13 '22 09:11

AnalyticsBuilder


Ensure your using a search type of Query then Fetch.

like image 38
Andy Avatar answered Nov 13 '22 09:11

Andy