duckdb query takes too long to process and return inside Flask application

Question

I have a Flask app and want to use duckdb as a database for several endpoints. My idea is to query the data and return it as a .parquet file. When I test my database with a simple Python script outside of the Flask app, it can query the data and save it as a .parquet in under a second. When I bring that same methodology to the Flask app, it still successfully queries the data and returns it as a .parquet file but it takes roughly 45 seconds. Other endpoints that return a .parquet file -- ones that are pre-staged and do not need to be queried -- can do so in just a second or two. So the issue, apparently, is incorporating duckdb inside my Flask application. Here is a sample boiler plate of what I have:

@test.route('/duckdb', methods = ['GET'])
def duckdb_test():

    con = duckdb.connect(database = '~/flask_db/test.db')

    # get tempfile .parquet
    tmp = tempfile.NamedTemporaryFile(suffix = '.parquet', mode = 'w+b', delete = False)

    # get data
    df = con.sql("SELECT * FROM tbl WHERE name = 'John'").to_df()

    # write to temporary .parquet
    df.to_parquet(tmp.name, engine='pyarrow', index=False)

    return send_file(tmp.name, mimetype='application/octet-stream', as_attachment=True, download_name="request.parquet")

I want to save it as a temp file. Not really sure what's wrong here. Again, it does work, but it just takes way, way too much time. The data being returned is about 12,000 rows in a ~5.5M row database -- but given that it works fairly quickly outside of the Flask app, on the same VM, the size itself shouldn't be an issue.

Surya R · Accepted Answer

My Suggestion is not to connect to the DB every time the API is being hitted. Instead create a engine and reuse the same connection. a thing called "Engine" can be helpful to you, so that you don't have to worry about connecting to the db everytime you hit the DB.

check this pypi package here at pypi duck db engine package:

steps to install and use in flask:

Installation:

 pip install duckdb-engine

Engine Creation: plain engine method

Create the engine in some area like where you create and parse the config in flask and use the engine wherever you need.

 eng = create_engine("duckdb:///:memory:")

Engine Creation: Session creation method

eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)

Import and Use the engine where ever required:

 eng.execute("register", ("dataframe_name", pd.DataFrame(...)))
 eng.execute("select * from dataframe_name")

Sample Examples to use with pandas:

  df = pd.read_sql('users', engine)

duckdb query takes too long to process and return inside Flask application

Tags:

python

flask

duckdb

codeweird

1 Answers

Surya R

Recent Activity

Donate For Us

duckdb query takes too long to process and return inside Flask application

Tags:

python

flask

duckdb

codeweird

1 Answers

Surya R

Related questions

Recent Activity

Donate For Us