Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to serialize/deserialize Pandas DataFrame to and from ProtoBuf/Gzip in a RESTful Flask App?

I have a pandas dataframe to be returned as a Flask Response object in a flask application. Currently I am converting it to a JSON Object,

df = df.to_json()
return Response(df, status=200, mimetype='application/json') 

The dataframe size is really huge of the magnitude, probably 5000000 X 10. On the client side when I deserialize it as,

df = response.read_json()

As my number of URL request parameters grow, the dataframe grows as well. Deserialization time grows at a linear factor as compared to serialization, which I would want to avoid. e.g: Serialization takes 15-20 seconds, deserialization takes 60-70 seconds.

Is there a way that protobuf can help in this case to convert pandas dataframe to a protobuf object. Also is there a way that I can send this JSON as Gunzipped mimetype through flask? I believe there's a comparable timing and efficiency between protobuf and gunzip.

What's the best solution in such a scenario?

Thanks in advance.

like image 614
user6591903 Avatar asked Jul 15 '16 04:07

user6591903


1 Answers

I ran into the same problem recently. I solved it by iterating through the rows of my DataFrame and calling protobuf_obj.add() in that loop, using info from the DataFrame. You can then GZIP the serialized string output.

i.e. something along the lines of:

for _, row in df.iterrows():
    protobuf_obj.add(val1=row[col1], val2=row[col2])
proto_str = protobuf_obj.SerializeToString()
return gzip.compress(proto_str)

Given that this question hasn't been answered in 9 months, I'm not sure there's a better solution but definitely open to hearing one if there is!

like image 87
erkyky Avatar answered Oct 05 '22 11:10

erkyky