I have a Flask app and want to use duckdb
as a database for several endpoints. My idea is to query the data and return it as a .parquet file. When I test my database with a simple Python script outside of the Flask app, it can query the data and save it as a .parquet in under a second. When I bring that same methodology to the Flask app, it still successfully queries the data and returns it as a .parquet file but it takes roughly 45 seconds. Other endpoints that return a .parquet file -- ones that are pre-staged and do not need to be queried -- can do so in just a second or two. So the issue, apparently, is incorporating duckdb
inside my Flask application. Here is a sample boiler plate of what I have:
@test.route('/duckdb', methods = ['GET'])
def duckdb_test():
con = duckdb.connect(database = '~/flask_db/test.db')
# get tempfile .parquet
tmp = tempfile.NamedTemporaryFile(suffix = '.parquet', mode = 'w+b', delete = False)
# get data
df = con.sql("SELECT * FROM tbl WHERE name = 'John'").to_df()
# write to temporary .parquet
df.to_parquet(tmp.name, engine='pyarrow', index=False)
return send_file(tmp.name, mimetype='application/octet-stream', as_attachment=True, download_name="request.parquet")
I want to save it as a temp file. Not really sure what's wrong here. Again, it does work, but it just takes way, way too much time. The data being returned is about 12,000 rows in a ~5.5M row database -- but given that it works fairly quickly outside of the Flask app, on the same VM, the size itself shouldn't be an issue.
My Suggestion is not to connect to the DB every time the API is being hitted. Instead create a engine and reuse the same connection. a thing called "Engine" can be helpful to you, so that you don't have to worry about connecting to the db everytime you hit the DB.
check this pypi package here at pypi duck db engine package:
steps to install and use in flask:
Installation:
pip install duckdb-engine
Engine Creation: plain engine method
Create the engine in some area like where you create and parse the config in flask and use the engine wherever you need.
eng = create_engine("duckdb:///:memory:")
Engine Creation: Session creation method
eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)
Import and Use the engine where ever required:
eng.execute("register", ("dataframe_name", pd.DataFrame(...)))
eng.execute("select * from dataframe_name")
Sample Examples to use with pandas:
df = pd.read_sql('users', engine)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With