Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL connection pool in python?

I'm trying to process large amount of data using Python and maintaining processing status in MySQL. However, I'm surprised there is no standard connection pool for python-mysql (like HikariCP in Java).

I initially started with PyMySQL, things were great until the program ran for first few hours. After few hours, things started to fail. I was getting lot of errors like:

pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1' ([Errno 99] Cannot assign requested address)")

Moreover, lot of ports were stuck in TIME_WAIT state because I'm opening and closing connections too frequently because of lack of connection pooling

/d/p/950 ❯❯❯ netstat -nt | wc -l
84752

Per this and this, I tried to set tcp_fin_timeout and ip_local_port_range, but hardly anything improved.

echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range

Then I found out that MySQL provides mysql.connector which comes with pooling functionality. After doing all that performance actually deteriorated. More processes started to get failed. I'm using Python's multiprocessing module to simultaneously run 29 processes(multiprocessing.Pool picked this no by default) on a 24 core machine. Following was the code, of course I was using .my.cnf to pass all the credential to avoid committing them to git :

    import mysql.connector
    from mysql.connector import pooling
    conn_pool = pooling.MySQLConnectionPool(pool_name="mypool1",
                                              pool_size=pooling.CNX_POOL_MAXSIZE,
                                              option_files=MYSQL_CONFIG,
                                              option_groups=MYSQL_GROUP_NODE1,
                                              allow_local_infile=True)
conn = conn_pool.get_connection()

Finally, reverted back to old code. Still using PyMySQL and though errors are less frequent it is still causing a major problem. I looked at SQLAlchemy and couldn't really found much of a documentation around pooling.

I'm wondering how's everyone else dealing with mysql-python connection pooling issue? I really believe there should be something out there so that I don't have to reinvent the wheel.

Any pointers are much appreciated.

like image 330
Geek Avatar asked Mar 25 '18 02:03

Geek


1 Answers

DBUtils implements MySQL (and generally claims to support abritrary DB-API 2 compliant database interfaces) user-sized connection pool PooledDB, thead-mapped pool PersistentDB and SteadyDB (see functionality section). The latter should fit your case where multiprocessing.Pool creates worker processes with managed persistent database connection each. It is described as:

DBUtils.SteadyDB is a module implementing "hardened" connections to a database, based on ordinary connections made by any DB-API 2 database module. A "hardened" connection will transparently reopen upon access when it has been closed or the database connection has been lost or when it is used more often than an optional usage limit.

You can use it with PyMySQL like:

import pymysql
from DBUtils.SteadyDB import connect

db = connect(
  creator = pymysql, # the rest keyword arguments belong to pymysql
  user = 'guest', password = '', database = 'name', 
  autocommit = True, charset = 'utf8mb4', 
  cursorclass = pymysql.cursors.DictCursor)

Also see this related question for more examples.

like image 118
saaj Avatar answered Oct 17 '22 20:10

saaj