Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make Dash app run faster if its slowed by large data imported

I am calling large dataset of about 250,000 values with 351 columns using dash app so that I can display it. However it takes a long time to run, I think it because of the data I am calling from a different app I used to collect data called REDCap. Now I would like to know if there is a better way to make my app run faster even though data is coming from a different app. See the code below :

import dash
import dash_core_components as dcc
import dash_html_components as html
import pandas as pd
from redcap import Project
import pandas as pd

#redcap api and key
api_url = "enter link"
api_key = "enter key"
project = Project(api_url, api_key)

#call data from redcap
def data():
    df = project.export_records(format="df", df_kwargs={"index_col": project.field_names[1]})
    return df

df = data()


#generate table
def generate_table(dataframe, max_rows=10):
    return html.Table(
        # Header
        [html.Tr([html.Th(col) for col in dataframe.columns])] +

        # Body
        [html.Tr([
            html.Td(dataframe.iloc[i][col]) for col in dataframe.columns
        ]) for i in range(min(len(dataframe), max_rows))]
    )


external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

app.layout = html.Div(children=[
    html.H4(children='US Agriculture Exports (2011)'),
    generate_table(df)
])

if __name__ == '__main__':
    app.run_server(debug=True)

Kindly assist on how I can make the app run faster since the section I am calling data is slowing it down

like image 692
LivingstoneM Avatar asked Dec 13 '19 06:12

LivingstoneM


1 Answers

A few things here:

1) Exporting your data from redcap with project.export_records is likely an unnecessary step. I'm not 100% sure of the data structure you're working with, but I would suggest convert the object into a pandas dataframe – Pandas is super fast with structured data.

2) Assuming you're not going to display all your data, I would suggest limiting the size of the dataframe to the minimum size necessary.

3) The generation of html for your dataframe is computationally heavy and a bit loopy, relying on indexing. I'd make the following change to the code there:

# Generating the Body (Slightly more readable and a lot less loopy & indexy)
html_all_rows = []
for idx, row in dataframe[:max_rows].iterrows():
   html_row = html.Tr([html.Td(v) for v in row])
   html_all_rows.append(html_row)

4) Alternatively, I would suggest using Plotly's built in datatable. It's more interactive object than a typical table + it allows for really neat sorting & querying. Data input for a datatable is a json like dictionary, so speed is increased once your data is accessed.

5) Again, I would suggest only loading to the app the data needed. I can't imagine 350 fields would be useful to anyone – similarly, 250,000 rows.

like image 161
Yaakov Bressler Avatar answered Nov 21 '22 21:11

Yaakov Bressler