Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access a particular field in arbitrarily nested JSON data [duplicate]

I have some JSON data like:

{
  "status": "200",
  "msg": "",
  "data": {
    "time": "1515580011",
    "video_info": [
      {
          "announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
          "announcement_shop": "",

etc.

How do I grab the content "FOLLOW ME PLEASE"? I tried using

replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']

But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.

How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?

like image 231
aquatic7 Avatar asked Jan 10 '18 18:01

aquatic7


2 Answers

In a single line -

>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'

To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.

First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.

{
    'data': {
        'time': '1515580011',
        'video_info': [{
            'announcement': (    # ***
            """{
                "announcement_id": "6",
                "name": "INS\\u8d26\\u53f7",
                "icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
                "icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
                "videoid": "15154610218328614178",
                "content": "FOLLOW ME PLEASE",
                "x_coordinate": "0.22",
                "y_coordinate": "0.23"
            }"""),
            'announcement_shop': ''
        }]
    },
    'msg': '',
    'status': '200'
} 

*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.

First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.

So, in summary, "descend" the ladder that is "data" using the following "rungs" -

  1. data, a dictionary
  2. video_info, a list of dicts
  3. announcement, a dict in the first dict of the list of dicts
  4. content residing as part of json data.

First,

i = data['data']

Next,

j = i['video_info']

Next,

k = j[0] # since this is a list

If you only want the first element, this suffices. Otherwise, you'd need to iterate:

for k in j:
    ...

Next,

l = k['announcement']

Now, l is JSON data. Load it -

import json
m = json.loads(l)

Lastly,

content = m['content']

print(content)
'FOLLOW ME PLEASE'

This should hopefully serve as a guide should you have future queries of this nature.

like image 51
cs95 Avatar answered Oct 18 '22 21:10

cs95


You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.

You'll have to decode that string first:

import json

replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])

then handle the resulting dictionary from there.

like image 43
Martijn Pieters Avatar answered Oct 18 '22 21:10

Martijn Pieters