I have a pandas dataframe to be returned as a Flask Response object in a flask application. Currently I am converting it to a JSON Object,
df = df.to_json()
return Response(df, status=200, mimetype='application/json') 
The dataframe size is really huge of the magnitude, probably 5000000 X 10. On the client side when I deserialize it as,
df = response.read_json()
As my number of URL request parameters grow, the dataframe grows as well. Deserialization time grows at a linear factor as compared to serialization, which I would want to avoid. e.g: Serialization takes 15-20 seconds, deserialization takes 60-70 seconds.
Is there a way that protobuf can help in this case to convert pandas dataframe to a protobuf object. Also is there a way that I can send this JSON as Gunzipped mimetype through flask? I believe there's a comparable timing and efficiency between protobuf and gunzip. 
What's the best solution in such a scenario?
Thanks in advance.
I ran into the same problem recently. I solved it by iterating through the rows of my DataFrame and calling protobuf_obj.add() in that loop, using info from the DataFrame. You can then GZIP the serialized string output.
i.e. something along the lines of:
for _, row in df.iterrows():
    protobuf_obj.add(val1=row[col1], val2=row[col2])
proto_str = protobuf_obj.SerializeToString()
return gzip.compress(proto_str)
Given that this question hasn't been answered in 9 months, I'm not sure there's a better solution but definitely open to hearing one if there is!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With