I am calling large dataset of about 250,000 values with 351 columns using dash app so that I can display it. However it takes a long time to run, I think it because of the data I am calling from a different app I used to collect data called REDCap. Now I would like to know if there is a better way to make my app run faster even though data is coming from a different app. See the code below :
import dash
import dash_core_components as dcc
import dash_html_components as html
import pandas as pd
from redcap import Project
import pandas as pd
#redcap api and key
api_url = "enter link"
api_key = "enter key"
project = Project(api_url, api_key)
#call data from redcap
def data():
df = project.export_records(format="df", df_kwargs={"index_col": project.field_names[1]})
return df
df = data()
#generate table
def generate_table(dataframe, max_rows=10):
return html.Table(
# Header
[html.Tr([html.Th(col) for col in dataframe.columns])] +
# Body
[html.Tr([
html.Td(dataframe.iloc[i][col]) for col in dataframe.columns
]) for i in range(min(len(dataframe), max_rows))]
)
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div(children=[
html.H4(children='US Agriculture Exports (2011)'),
generate_table(df)
])
if __name__ == '__main__':
app.run_server(debug=True)
Kindly assist on how I can make the app run faster since the section I am calling data is slowing it down
A few things here:
1) Exporting your data from redcap with project.export_records
is likely an unnecessary step. I'm not 100% sure of the data structure you're working with, but I would suggest convert the object into a pandas dataframe – Pandas is super fast with structured data.
2) Assuming you're not going to display all your data, I would suggest limiting the size of the dataframe to the minimum size necessary.
3) The generation of html for your dataframe is computationally heavy and a bit loopy, relying on indexing. I'd make the following change to the code there:
# Generating the Body (Slightly more readable and a lot less loopy & indexy)
html_all_rows = []
for idx, row in dataframe[:max_rows].iterrows():
html_row = html.Tr([html.Td(v) for v in row])
html_all_rows.append(html_row)
4) Alternatively, I would suggest using Plotly's built in datatable. It's more interactive object than a typical table + it allows for really neat sorting & querying. Data input for a datatable is a json like dictionary, so speed is increased once your data is accessed.
5) Again, I would suggest only loading to the app the data needed. I can't imagine 350 fields would be useful to anyone – similarly, 250,000 rows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With