Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Protocol buffers for serializing several data objects of a post/comment into a single serialized piece of data

I am developing a social application on top of Java and Cassandra database. I need to store posts/comments of the user's shared posts, in database for which I am looking to serialize data per single comment/post & then store serialized data in the database in one column. Thus for each comment, there'll be a single column that stores this data in serialized format:-

  1. Comment data(String around 700 characters max)
  2. CommentorId (long type)
  3. CommentTime (timestamp)

Similarly a posts' data will be serialized and stored as a single column.

Fast Deserialization would be required at each retrieval of that post by the frontend.

I am looking at protocol buffers as the probable solution for this. Would like to know whether choosing protocol buffers for this task is the right choice or not. I am looking for a high performance & fast serialization & deserialization algorithm that can serve for heavy usage in the application.

Also, is it possible to send the data in serialized format, to client and then there could it be deserialized ? server to client communication?

like image 918
Rajat Gupta Avatar asked Mar 03 '11 18:03

Rajat Gupta


2 Answers

protocol buffers certainly provides serialization, although the RPC side of things is left to your imagination (often something simple and socket-based works very well).

The data-types are all well supported by protobuf (although you might want to use something like ms into the unix epoch for the date). Note though that protobuf doesn't include compression (unless you also apply gzip etc to the stream). So the message will be "a bit longer than the string (which always uses UTF-8 encoding in protobuf). I say a "a bit" because the varint algorithm for integer types could give anything between 1 and 10 bytes each for the id and timestamp, depending on their magnitude. And a few (3, probably) bytes for the field headers.

If that sounds about right, then it should work fine. If you have lots of text data, though, you might want to run the protobuf stream through gzip as well. Java has excellent support within protobuf via the main google trunk.

like image 169
Marc Gravell Avatar answered Sep 18 '22 05:09

Marc Gravell


Don't know if that fits in your specific case but I have seen suggestions to store a JSON representation of the data that can be directly sent to the browser. If you don't need any further processing steps involving POJOs then this or a similar approach might be a (fast) way to go.

like image 31
Martin Klinke Avatar answered Sep 18 '22 05:09

Martin Klinke