Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy connection errors

I'm experiencing some strange bugs which seem to be caused by connections used by Sqlalchemy, which i can't pin down exactly.. i was hoping someone has a clue whats going on here.

We're working on a Pyramid (version 1.5b1) and use Sqlalchemy (version 0.9.6) for all our database connectivity. Sometimes we get errors related to the db connection or session, most of the time this would be a cursor already closed or This Connection is closed error, but we get other related exceptions too:

(OperationalError) connection pointer is NULL
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed

A conflicting state is already present in the identity map for key (<class '...'>, (1001L,))
This Connection is closed (original cause: ResourceClosedError: This Connection is closed)
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session; lazy load operation of attribute '...' cannot proceed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed
'NoneType' object has no attribute 'twophase'
(OperationalError) connection pointer is NULL
This session is in 'prepared' state; no further

There is no silver bullet to reproduce them, only by refreshing many times they are bound to happen one at some point. So i made a script using multi-mechanize to spam different urls concurrently and see where and when it happens.

It appears the url triggered doesn't really matter, the errors happen when there are concurrent requests that span a longer time (and other requests get served in between). This seems to indicate there is some kind of threading problem; that either the session or connection is shared among different threads.

After googling for these issues I found a lot of topics, most of them tell to use scoped sessions, but the thing is we do use them already:

db_session = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), autocommit=False, autoflush=False))
db_meta = MetaData()
  • We have a BaseModel for all our orm objects:

    BaseModel = declarative_base(cls=BaseModelObj, metaclass=BaseMeta, metadata=db_meta)

  • We use the pyramid_tm tween to handle transactions during the request

  • We hook db_session.remove() to the pyramid NewResponse event (which is fired after everything has run). I also tried putting it in a seperate tween running after pyramid_tm or even not doing it at all, none of these seem to have effect, so the response event seemed like the most clean place to put it.

  • We create the engine in our main entrypoint of our pyramid project and use a NullPool and leave connection pooling to pgbouncer. We also configure the session and the bind for our BaseModel here:

    engine = engine_from_config(config.registry.settings, 'sqlalchemy.', poolclass=NullPool) db_session.configure(bind=engine, query_cls=FilterQuery) BaseModel.metadata.bind = engine config.add_subscriber(cleanup_db_session, NewResponse) return config.make_wsgi_app()

  • In our app we access all db operation using:

    from project.db import db_session ... db_session.query(MyModel).filter(...) db_session.execute(...)

  • We use psycopg2==2.5.2 to handle the connection to postgres with pgbouncer in between

  • I made sure no references to db_session or connections are saved anywhere (which could result in other threads reusing them)

I also tried the spamming test using different webservers, using waitress and cogen i got the errors very easily, using wsgiref we unsurprisingly have no errors (which is singlethreaded). Using uwsgi and gunicorn (4 workers, gevent) i didn't get any errors.

Given the differences in the webserver used, I thought it either has to do with some webservers handling requests in threads and some using new processes (maybe a forking problem)? To complicate matters even more, when time went on and i did some new tests, the problem had gone away in waitress but now happened with gunicorn (when using gevent)! I have no clue on how to go debugging this...

Finally, to test what happens to the connection, i attached an attribute to the connection at the start of the cursor execute and tried to read the attribute out at the end of the execute:

@event.listens_for(Engine, "before_cursor_execute")
def _before_cursor_execute(conn, cursor, stmt, params, context, execmany):
  conn.pdtb_start_timer = time.time()

@event.listens_for(Engine, "after_cursor_execute")
def _after_cursor_execute(conn, cursor, stmt, params, context, execmany):
  print conn.pdtb_start_timer

Surprisingly this sometimes raised an exception: 'Connection' object has no attribute 'pdtb_start_timer'

Which struck me as very strange.. I found one discussion about something similar: https://groups.google.com/d/msg/sqlalchemy/GQZSjHAGkWM/rDflJvuyWnEJ And tried adding strategy='threadlocal' to the engine, which from what i understand should force 1 connection for the tread. But it didn't have any effect on the errors im seeing.. (besides some unittests failing because i need two different sessions/connections for some tests and this forces 1 connection to be associated)

Does anyone have any idea what might go on here or have some more pointers on how to attack this problem?

Thanks in advance!

Matthijs Blaas

like image 971
Matthijs Blaas Avatar asked Sep 10 '14 14:09

Matthijs Blaas


1 Answers

Update: The errors where caused by multiple commands that where send in one prepared sql statement. Psycopg2 seems to allow this, but apparently it can cause strange issues. The PG8000 connector is more strict and bailed out on the multiple commands, sending one command fixed the issue!

like image 110
Matthijs Blaas Avatar answered Sep 21 '22 18:09

Matthijs Blaas