How do I access embedded json objects in a Pandas DataFrame?

Tags:

TL;DR If loaded fields in a Pandas DataFrame contain JSON documents themselves, how can they be worked with in a Pandas like fashion?

Currently I'm directly dumping json/dictionary results from a Twitter library (twython) into a Mongo collection (called users here).

from twython import Twython
from pymongo import MongoClient

tw = Twython(...<auth>...)

# Using mongo as object storage 
client = MongoClient()
db = client.twitter
user_coll = db.users

user_batch = ... # collection of user ids
user_dict_batch = tw.lookup_user(user_id=user_batch)

for user_dict in user_dict_batch:
    if(user_coll.find_one({"id":user_dict['id']}) == None):
        user_coll.insert(user_dict)

After populating this database I read the documents into Pandas:

# Pull straight from mongo to pandas
cursor = user_coll.find()
df = pandas.DataFrame(list(cursor))

Which works like magic:

Pandas is magic

I'd like to be able to mangle the 'status' field Pandas style (directly accessing attributes). Is there a way?

status field

EDIT: Something like df['status:text']. Status has fields like 'text', 'created_at'. One option could be flattening/normalizing this json field like this pull request Wes McKinney was working on.

438

asked Sep 06 '13 19:09

Kyle Kelley

1 Answers

One solution is just to smash it with the Series constructor:

In [1]: df = pd.DataFrame([[1, {'a': 2}], [2, {'a': 1, 'b': 3}]])

In [2]: df
Out[2]: 
   0                   1
0  1           {u'a': 2}
1  2  {u'a': 1, u'b': 3}

In [3]: df[1].apply(pd.Series)
Out[3]: 
   a   b
0  2 NaN
1  1   3

In some cases you'll want to concat this to the DataFrame in place of the dict row:

In [4]: dict_col = df.pop(1)  # here 1 is the column name

In [5]: pd.concat([df, dict_col.apply(pd.Series)], axis=1)
Out[5]: 
   0  a   b
0  1  2 NaN
1  2  1   3

If the it goes deeper, you can do this a few times...

answered Oct 11 '22 20:10

Andy Hayden

Related questions
                            
                                Python Cut Example
                            
                                How does extending classes (Monkey Patching) work in Python?
                            
                                How to use the win32gui module with Python?
                            
                                get the DST boundaries of a given timezone in python
                            
                                Filter directory when using shutil.copytree?
                            
                                How to trigger authenticated Jenkins job with file parameter using standard Python library
                            
                                Identify contiguous regions in 2D numpy array
                            
                                How can I open UTF-16 files on Python 2.x?
                            
                                Accessing class variables via instance
                            
                                use slugify in template
                            
                                Python multiprocessing keyword arguments
                            
                                Check if a directory exists in a zip file with Python
                            
                                Handling directories with spaces Python subprocess.call()
                            
                                Python: How to check if a string is a valid IRI?
                            
                                Understanding pandas dataframe indexing
                            
                                what does this operator means in django `reduce(operator.and_, query_list)`
                            
                                What's the most pythonic way to iterate over all the lines of multiple files?
                            
                                Python: How to check for RSS updates with feedparser and etags
                            
                                How do I fix this "TypeError: 'str' object is not callable" error?
                            
                                How does one append large amounts of data to a Pandas HDFStore and get a natural unique index?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I access embedded json objects in a Pandas DataFrame?

Tags:

python

json

pandas

mongodb

twitter

Kyle Kelley

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us