Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using BufferedWriter in flask whooshalchemy

Hi I am running a flask app with a postgreSQL database. I get LockErrors when using multiple workers. I learned that this is because the whoosh search locks the database

http://stackoverflow.com/questions/36632787/postgres-lockerror-how-to-investigate

As explained in this link I have to use BufferedWriter... I google around, but I really can't figure out how to implement it? Here is my database setup in terms of whoosh

import sys
if sys.version_info >= (3, 0):
    enable_search = False
else:
    enable_search = True
    import flask.ext.whooshalchemy as whooshalchemy

class User(db.Model):
    __searchable__ = ['username','email','position','institute','id'] # these fields will be indexed by whoosh

    id = db.Column(db.Integer, primary_key=True)
    username = db.Column(db.String(100), index=True)
    ...

    def __repr__(self):
        return '<User %r>' % (self.username)

if enable_search:
    whooshalchemy.whoosh_index(app, User)

help is much appreciated thanks carl

EDIT: If there is no capability for parallel access in flask-whosshsqlalchemy are there any alternatives you could suggest?

like image 380
carl Avatar asked Apr 22 '16 19:04

carl


1 Answers

As you can read here:

http://whoosh.readthedocs.io/en/latest/threads.html

Only one writer can hold lock. Buffered writer, keeps your data for sometime, but... at some point your objects are stored, and that mean - lock.

According to that document async writer is something that you are looking for, but... That would try to store your data, if fails - it will create additional thread, and retry. Let's suppose you are throwing 1000 new items. Potentially you will end up with something like 1000 threads. It can be better to treat each insert as a task, and send it to separate thread. If there are many processes, you can stack that tasks. For instance - insert 10, and wait. If that 10 are inserted as a batch, in short time? Will work - for some time...

Edit

Sample with async reader - to make buffered - simply rename import, and usage.

import os, os.path
from whoosh import index
from whoosh.fields import SchemaClass, TEXT, KEYWORD, ID

if not os.path.exists("data"):
    os.mkdir("data")

# http://whoosh.readthedocs.io/en/latest/schema.html
class MySchema(SchemaClass):
    path = ID(stored=True)
    title = TEXT(stored=True)
    icon = TEXT
    content = TEXT(stored=True)
    tags = KEYWORD

# http://whoosh.readthedocs.io/en/latest/indexing.html
ix = index.create_in("data", MySchema, indexname="myindex")

writer = ix.writer()
writer.add_document(title=u"My document", content=u"This is my document!",
                    path=u"/a", tags=u"first short", icon=u"/icons/star.png")
writer.add_document(title=u"Second try", content=u"This is the second example.",
                    path=u"/b", tags=u"second short", icon=u"/icons/sheep.png")
writer.add_document(title=u"Third time's the charm", content=u"Examples are many.",
                    path=u"/c", tags=u"short", icon=u"/icons/book.png")
writer.commit()

# needed to release lock
ix.close()

#http://whoosh.readthedocs.io/en/latest/api/writing.html#whoosh.writing.AsyncWriter
from whoosh.writing import AsyncWriter

ix = index.open_dir("data", indexname="myindex")

writer = AsyncWriter(ix)
writer.add_document(title=u"My document no 4", content=u"This is my document!",
                    path=u"/a", tags=u"four short", icon=u"/icons/star.png")
writer.add_document(title=u"5th try", content=u"This is the second example.",
                    path=u"/b", tags=u"5 short", icon=u"/icons/sheep.png")
writer.add_document(title=u"Number six is coming", content=u"Examples are many.",
                    path=u"/c", tags=u"short", icon=u"/icons/book.png")
writer.commit()
like image 142
Michał Zaborowski Avatar answered Nov 07 '22 08:11

Michał Zaborowski