Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CentOS 6.4 + Haystack (2.1.0) + ElasticSearch (1.2.1) = SearchParseException ... Parse Failure

Versions:

CentOS          - 6.4 (Final)
Haystack        - 2.1.0
ElasticSearch   - 1.2.1
Java            - 1.7.0_55
Django-cms      - 2.3.1
pyelasticsearch - 0.6

I'm having problems using ElasticSearch in a Django Project in a CentOS machine. I'm used to configure Elasticsearch/Haystack in Ubuntu machines, and never had an issue like this.

I receive this error Parse Failure [No parser for element [񐁱𠁥y𯊐]], but I have an empty Index. I tried to know exactly where the error came from, so I tried:

  • Delete the index and try again (got the same error)
  • Try it with an empty Index (got the same error)
  • Install/Reinstall -> Purge and reinstall
  • I tried with older and newer ES versions (0.90, 1.2, 1.1..)
  • Check JAVA version
  • Look for versions incompatibilities -

At first I thought the error was caused by the content of the index, the data I was trying to index, but after delete/clear the index I still got the same error.

I'm trying to show an empty query set in a Template (because I don't know what else try to figure out where the problem is)

Additional information

When I open the Django shell with python manage.py shell and make

from haystack.query import SearchQuerySet

SearchQuerySet()
SearchQuerySet().all()
SearchQuerySet().filter(content='any_text')

That kind of querys runs without problems, and return empty queryset if don't find anything, or returns the queryset if finds something. I get the error and the issue when trying to use that commands in my view and return result to template. In the shell everything works normal, if the index is empty return empty queryset, if the index has values return the values it has to return

search_indexes.py

import datetime
from haystack import indexes
from django.contrib.auth.models import User
from cms.models import CMSPlugin, Page

class CMSPluginIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    plugin_type = indexes.CharField(model_attr='plugin_type')
    language = indexes.CharField(model_attr='language')

    def get_model(self):
        return CMSPlugin

    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.all()

cms_plugin_text.txt

{{object.language}}  # I added just this field to make sure is not the content of this

resume of views.py

def search_query(request):    
    sqs = SearchQuerySet().all()
    return HttpResponse(sqs)

Django Error

Invalid JSON returned from ES: <Response [404]>
Exception Value: Invalid JSON returned from ES: <Response [404]>

Full ElasticSearch error trace in console

[DEBUG][action.search.type] [Futurist] [haystack][2], node[inisR695RtGZ_WnEnkRr1w], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@28cfe3d9] lastShard [true]
        org.elasticsearch.search.SearchParseException: [haystack][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"\ud900\udc71\ud840\udc65y\u0000\ud840\udc61": 
    {"\uda00\udc66\ud8c0\udc6c\ud840\udc65\ud8c0\udc65\ud8c0\udc00a\uda00\udc42\udac0\udc65":
     {"\uda00\udc66\ud8c0\udc6c\ud840\udc65\u0000\ud880\udc25\udbc0\udc20": 
    {"\ud800\udc66\ud900\udc75\uda00\udc72\u0000\ud880\udc25\ud8c0\udc20": 
    {"\ud900\udc71\ud840\udc65y\u0000\ud840\udc61": 
    {"\ud900\udc71\ud840\udc65\udb80\udc79\ud8c0\udc73\uda00\udc72\ud980\udc6e\u0000\u0000\uda00
    \udc64\ud8c0\udc73\udb40\udc61\ud900\udc63": {"\ud900\udc71\ud840\udc65y\u0000\ud840\udc61": 
    "\uda40\udc64\udb40\udc61\udb80\udc67\ud880\udc5f\uda40\udc74\ud880\udc28\ud880\udc6d\ud880\
    udc2e\ud880\udc6d\udac0\udc70\ud980\udc75\udb40\udc69\udb80\udc20\udbc0\udc52\ud840\udc70\ud
    b40\udc6f\uda80\udc6f\udac0\udc61\ud880\udc5f\ud800\udc68\ud900\udc72\udb40\udc64\udb80\udc6
    3\udb40\udc75\ud840\udc74\uda00\udc79\ud880\udc00\u0000\u0002\u0000\uda34\ude80\u7f8eD\u0000
    \udbbf\udfff\udbbf\udfff\u0000\ud917\udc22\udbcc\udc38\ud899\udc75\ud917\udc36\udbcc\udc39\u
    d899\udc75\ud917\udc35\udbcc\udc61\ud899\udc75\ud917\udc32\udbcc\udc30\ud9d9\udc75\ud917\udc
    30\ud90c\ude63\ud9d9\udc75"}}, 
    "\ud880\udc5f\ud880\udc61\ud900\udc68\u0000\udb80\udc70\udb40\udc69": true}}, 
    "\ud900\udc71\ud840\udc65y\u0000\ud840\udc61": 
    {"\ud900\udc71\ud840\udc65\udb80\udc79\ud8c0\udc73\uda00\udc72\ud980\udc6e\u0000\u0000\uda00
    \udc64\ud8c0\udc73\udb40\udc61\ud900\udc63": {"\ud900\udc71\ud840\udc65y\u0000\ud840\udc61": 
    "\ud900\udc28)\ud89d\udc30", 
    "\ud900\udc64\ud800\udc66\udac0\udc75\udb80\udc74\udbc0\udc6f\ud840\udc65\ud8c0\udc61\ud840\
    udc6f\u0000\ud99b\udf6c\udb40\udc61\udac0\udc61\uda40\udc79\udb80\udc65\uda00\udc77\ud8c0\ud
    c6c": "\udb40\udc41D\ud8c0\udc67", 
    "\ud900\udc64\ud800\udc66\udac0\udc75\udb80\udc74\uda00\udc66\udac0\udc65d\u0000\u0001\u0000
    \uda18\ude40\u7f8e\u0001": "\ud900\udc74\ud8c0\udc78\udbc0\udc00\udb00\udc65", 
    "\ud900\udc61\udb80\udc74\ud980\udc5f\udb40\udc65\ud840\udc65\ud8c0\udc61\udb80\udc65\ud9c0\
    udc70\ud800\udc72\ud900\udc73\ud800\udc5f\ud900\udc75\uda00\udc72\ud880\udc65\ud8c0\udc00\u7
    36c\u0000\u0000\udba4\udc30\u7f8e\udbbf\udffe\udbbf\udfff\u0001\u0000\u0001\u0000\uda38\ude8
    0\u7f8e": true, 
    "\udb40\udc61\udac0\udc61\uda40\udc79\udb80\udc65\uda00\udc77\ud8c0\udc6c\ud800\udc63\ud8c0\
    udc72\ud880\udc00\u7f8e\b\u0000\uda34\ude80\u7f8e\u0003\u0000": true}}}}, 
    "\ud840\udc66\udb00\udc6f\ud840\udc00\u7f8e": 0, "\uda00\udc73\ud900\udc7a\u0000\u0000": 
    20}]]

at org.elasticsearch.search.SearchService.parseSource(SearchService.java:634)
    at org.elasticsearch.search.SearchService.createContext(SearchService.java:507)
    at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
    at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
    at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)
    at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
    at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
    at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
    at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.search.SearchParseException: [haystack][3]: from[-1],size[-1]: Parse Failure [No parser for element [񐁱𠁥y𯊐]]
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:620)
    ... 11 more

If you need any other information to figure out whats happening here, don't hesitate asking and I'll post it asap.

I'm thinking maybe the problem comes from pyelasticsearch. Anyone have any issue similar to this ?

EDIT

I've tried another thing, I installed ElasticSearch in a Ubuntu server and make the querys from CentOS to Ubuntu. I'm indexing CMSPlugin model from django-cms, and it seems the body/text of the plugins contains some special characters and Java or Elasticsearch fails trying to parse. This is the first string that make ElasticSearch/Java crashes \ud900\udc71\ud840\udc65y\u0000\ud900\udc74.

I tried this in python console

c=u'\ud900\udc71\ud840\udc65y\u0000\ud900\udc74'
print c

Output: 񐁱𠁥y񐁴

EDIT 2

I'm wondering if maybe the Java on CentOS is having some issue, I've tried downgrading Java version to 1.6 and didn't worked

EDIT 3

Right now I'm working directly with ElasticSearch, making the querys using urllib2 to elasticsearch, avoiding work with Haystack. ElasticSearch answer the queries perfect (I have to manage the JSON). I supose the issue is how Haystack is generating or parsing the querys, because when I try to use SearchQuerySet() and make something like SearchQuerySet().filter(content='whatever') using shell or in the view, ElasticSearch crashes with the error above, but works well when doing CURL

EDIT 4

Finally I'm working directly with the last ElasticSearch without using Haystack. It seems the issue is how Haystack/pyelasticsearch format the queries to send to ES, that cannot parse the encoding and fails in each request.

I didn't found any solution, just avoid to work with Haystack, if anyone can point to a solution would be great, I also send an e-mail to Haystack people to see if they've already noticed this issue.

EDIT 5

If anyone has configured Haystack on CentOS I would appreciate any guide about the configuration and the versions of the software. I have ElasticSearch running on CentOS but I'm managing the queries directly through ElasticSearch, ignoring Haystack completely

like image 323
AlvaroAV Avatar asked Nov 10 '22 05:11

AlvaroAV


1 Answers

I believe the problem here may be with the use of pyelasticsearch. With newer django-haystack (I believe >1.0) you need to use elasticsearch-py instead. If you try pip install elasticsearch it should install the latest library (1.1.1) and fix your problem. As an additional "just in case" measure, you could try removing pyelasticsearch with pip uninstall pyelasticsearch. This dependency is shown in a very non-obvious note, here: http://django-haystack.readthedocs.org/en/latest/installing_search_engines.html#elasticsearch

like image 112
Joey Wilhelm Avatar answered Nov 14 '22 22:11

Joey Wilhelm