Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing Python MySQL/Connector Speed

I have a table in MySQL that contains the following columns:

id int(11)
contract_id int(11)
datetime datetime
open decimal(18, 10)
high decimal(18, 10)
low decimal(18, 10)
close decimal(18, 10)

The table is fairly large (> 300 millions rows), but queries made within the database are executed within half a second even when they return 300,000 rows. However, when I retrieve data from Python it is very slow (same request goes from 0.5s in MySQL Workbench to 34s in Python):

import pandas as pd
import mysql.connector 

con = mysql.connector.connect(**CONFIG) 
cur = con.cursor()

def get_data1():
    df = pd.read_sql(
        """
        SELECT datetime, open, high, low, close 
        FROM prices
        WHERE contract_id = 1 
            AND datetime >= '2015-01-01 09:00:00' 
            AND datetime <= '2015-10-15 16:00:00'; 
        """, con)
    return df

I found that exporting data from MySQL to a flat file and then reading it in Python was 23x faster than a direct query on the database:

def get_data2():
    cur.execute(
        """
        SELECT datetime, open, high, low, close 
        FROM prices
        WHERE contract_id = 1 
            AND datetime >= '2015-01-01 09:00:00' 
            AND datetime <= '2015-10-15 16:00:00'
        INTO OUTFILE 'C:/Data/Temp.csv'
        FIELDS TERMINATED BY ','
        ENCLOSED BY '"'
        LINES TERMINATED BY "\n";
        """)
    return pd.read_csv('C:/Data/Temp.csv')

How can it be? I suspect that this is related to the data type conversion. Any idea how to make the function get_data1 faster without having to first export to CSV? Thank you.

like image 624
agiap Avatar asked Oct 19 '22 00:10

agiap


1 Answers

The following solution is 3x faster than the initial one (12s vs. 34s):

import mysql.connector
con = mysql.connector.connect(**CONFIG)
cur = con.cursor()
class MySQLConverter(mysql.connector.conversion.MySQLConverter):
    def _DECIMAL_to_python(self, value, desc=None):
        return float(value)
    _NEWDECIMAL_to_python = _DECIMAL_to_python

con.set_converter_class(MySQLConverter)

It converts MySQL decimal type into Python float instead of decimal.Decimal, which is faster. It is still much slower than the "CSV solution" that takes 1.57s to complete. Still digging...

like image 79
agiap Avatar answered Oct 22 '22 01:10

agiap