Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

duckdb query takes too long to process and return inside Flask application

I have a Flask app and want to use duckdb as a database for several endpoints. My idea is to query the data and return it as a .parquet file. When I test my database with a simple Python script outside of the Flask app, it can query the data and save it as a .parquet in under a second. When I bring that same methodology to the Flask app, it still successfully queries the data and returns it as a .parquet file but it takes roughly 45 seconds. Other endpoints that return a .parquet file -- ones that are pre-staged and do not need to be queried -- can do so in just a second or two. So the issue, apparently, is incorporating duckdb inside my Flask application. Here is a sample boiler plate of what I have:

@test.route('/duckdb', methods = ['GET'])
def duckdb_test():

    con = duckdb.connect(database = '~/flask_db/test.db')

    # get tempfile .parquet
    tmp = tempfile.NamedTemporaryFile(suffix = '.parquet', mode = 'w+b', delete = False)

    # get data
    df = con.sql("SELECT * FROM tbl WHERE name = 'John'").to_df()

    # write to temporary .parquet
    df.to_parquet(tmp.name, engine='pyarrow', index=False)

    return send_file(tmp.name, mimetype='application/octet-stream', as_attachment=True, download_name="request.parquet")

I want to save it as a temp file. Not really sure what's wrong here. Again, it does work, but it just takes way, way too much time. The data being returned is about 12,000 rows in a ~5.5M row database -- but given that it works fairly quickly outside of the Flask app, on the same VM, the size itself shouldn't be an issue.

like image 715
codeweird Avatar asked Sep 16 '25 05:09

codeweird


1 Answers

My Suggestion is not to connect to the DB every time the API is being hitted. Instead create a engine and reuse the same connection. a thing called "Engine" can be helpful to you, so that you don't have to worry about connecting to the db everytime you hit the DB.

check this pypi package here at pypi duck db engine package:

steps to install and use in flask:

Installation:

 pip install duckdb-engine

Engine Creation: plain engine method

Create the engine in some area like where you create and parse the config in flask and use the engine wherever you need.

 eng = create_engine("duckdb:///:memory:")

Engine Creation: Session creation method

eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)

Import and Use the engine where ever required:

 eng.execute("register", ("dataframe_name", pd.DataFrame(...)))
 eng.execute("select * from dataframe_name")

Sample Examples to use with pandas:

  df = pd.read_sql('users', engine)
like image 117
Surya R Avatar answered Sep 19 '25 09:09

Surya R