Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

creating a python dictionary like object from protocol buffers for use in pandas

I currently interface to a server that provides protocol buffers. I can potentially receive a very large number of messages. Currently my process to read the protocol buffers and convert them to a Pandas DataFrame (not a necessary step in general, but Pandas offers nice tools for analyzing datasets) is:

  1. Read protocol buffer, it will be a google protobuf object
  2. Convert protocol buffers to dictionary using protobuf_to_dict
  3. use pandas.DataFrame.from_records to get a DataFrame

This works great, but, given the large number of messages I read from the protobuf, it is quite inefficient to convert to dictionary and then to pandas. My question is: is it possible to make a class that can make a python protobuf object look like a dictionary? That is, remove step 2. Any references or pseudocode would be helpful.

like image 849
Justin Avatar asked Jul 10 '14 17:07

Justin


People also ask

What is protocol buffers in Python?

With protocol buffers, you write a . proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format.

What are protocol buffers used for?

Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data.

How do I convert a dictionary to pandas?

We can convert a dictionary to a pandas dataframe by using the pd. DataFrame. from_dict() class-method.

Can you put a dictionary in a Pandas DataFrame?

You can convert a dictionary to Pandas Dataframe using df = pd. DataFrame. from_dict(my_dict) statement. In this tutorial, you'll learn the different methods available to convert python dict to Pandas dataframe.


1 Answers

You might want to check the ProtoText python package. It does provide in-place dict-like operation to access your protobuf object.

Example usage: Assume you have a python protobuf object person_obj.

import ProtoText
print person_obj['name']       # print out the person_obj.name 
person_obj['name'] = 'David'   # set the attribute 'name' to 'David'
# again set the attribute 'name' to 'David' but in batch mode
person_obj.update({'name': 'David'})
print ('name' in person_obj)  # print whether the 'name' attribute is set in person_obj 
# the 'in' operator is better than the google implementation HasField function 
# in the sense that it won't raise Exception even if the field is not defined  
like image 58
Zheng Xu Avatar answered Sep 30 '22 10:09

Zheng Xu