Here is the situation I am facing with:
I just wrote a Flask app that people can input the text review they want and my app would return the most similar reviews from our dataset. So basically it is a NLP project and the machine learning model was trained already. The problem right now is the model is about 2.5GB and each time the user types in something, it will load that model to do some calculation.
I am ok with machine learning stuff but a total newbie at web development field. After some googling, I found that cache in Flask may be the solution and I tried to follow this tutorial http://brunorocha.org/python/flask/using-flask-cache.html
However, I failed to implement it. Could anyone give me some advice about what's the correct way to do it. If Flask cache is "the" solution, I will keep looking at that stuff and hope I could make it.
I will suggest to load the model once when you run your application. It can be done simply loading the model inside the main function. First time when you load your apps it will take some time, but each time when you call predict API it will be faster.
@app.route('/predict', methods=['POST', 'OPTIONS'])
def predict(tokenized):
global model
"do something"
return jsonify(result)
if __name__ == '__main__':
model = load_model('/model/files/model.h5')
app.run(host='0.0.0.0', port=5000)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With