Parallelizing pandas pyodbc SQL database calls

Tags:

I am currently querying data into dataframe via the pandas.io.sql.read_sql() command. I wanted to parallelize the calls similar to what this guys is advocating: (Embarrassingly parallel database calls with Python (PyData Paris 2015 ))

Something like (very general):

pools = [ThreadedConnectionPool(1,20,dsn=d) for d in dsns]
connections = [pool.getconn() for pool in pools]
parallel_connection = ParallelConnection(connections)
pandas_cursor = parallel_connection.cursor()
pandas_cursor.execute(my_query)

Is something like that possible?

424

asked Aug 21 '15 08:08

user1129988

1 Answers

Yes, this should work, although with the caveat that you'll need to change parallel_connection.py in that talk that you site. In that code there's a fetchall function which executes each of the cursors in parallel, then combines the results. This is the core of what you'll change:

Old Code:

def fetchall(self):
    results = [None] * len(self.cursors)
    def do_work(index, cursor):
        results[index] = cursor.fetchall()
    self._do_parallel(do_work)
    return list(chain(*[rs for rs in results]))

New Code:

def fetchall(self):
    results = [None] * len(self.sql_connections)
    def do_work(index, sql_connection):
        sql, conn = sql_connection  #  Store tuple of sql/conn instead of cursor
        results[index] = pd.read_sql(sql, conn)
    self._do_parallel(do_work)
    return pd.DataFrame().append([rs for rs in results])

Repo: https://github.com/godatadriven/ParallelConnection

128

answered Oct 18 '22 00:10

Tristan Reid

Related questions
                            
                                Sawtooth tkinter mainloop frame duration?
                            
                                Detecting empty function definitions in python
                            
                                Using libspotify .dll/.lib files in MinGW32 compiling pySpotify
                            
                                HTML like layouting
                            
                                How to keep vim from breaking string literals?
                            
                                Python: Refresh PivotTables in worksheet [closed]
                            
                                How is memory handled for np.ndarray in cython?
                            
                                When to catch MemoryError in Python?
                            
                                Deploying Flask on Windows in production
                            
                                pysqlite's IntegrityError: distinguish 'NOT NULL' from 'UNIQUE' violation
                            
                                How do I disable the libpng warning? (python, pygame) [closed]
                            
                                how to convert wav to mp3 in live using python?
                            
                                How to remove current directory from python import path
                            
                                How to customize response content type in flask-restful?
                            
                                unexpected memory footprint differences when spawning python multiprocessing pool
                            
                                worst-case time complexity of str.find in python
                            
                                Python Spark Dataframes: Better way to export groups to text file
                            
                                Convert hOCR to HTML table
                            
                                ipdb with python unittest module
                            
                                Is it possible to mock a C function using python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parallelizing pandas pyodbc SQL database calls

Tags:

python

sql

pandas

multithreading

pyodbc

user1129988

People also ask

1 Answers

Tristan Reid

Recent Activity

Donate For Us