Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trouble deleting certain nested JSON objects in python

I am trying to iterate through a list of nested JSON objects (returned from the twitter rest API via tweepy.api.search) and delete certain objects. I have a list of objects to keep. I wish to specify which dictionary objects to keep rather than which to delete because different tweets have different keys. They all have some keys such as "text", "created_at", etc... but there are other keys that only certain tweets have.

I am running into two problems.

1) I cannot delete a dictionary item while iterating through the dictionary

2) Many of the dictionary objects contain nested lists and dictionaries which I am having trouble accessing

A small portion of the JSON file I'm iterating through:

{
"statuses": [
    {
        "contributors": null,
        "coordinates": null,
        "created_at": "Thu Nov 12 01:28:07 +0000 2015",
        "entities": {
            "hashtags": [],
            "symbols": [],
            "urls": [
                {
                    "display_url": "twitter.com/thehill/status\u2026",
                    "expanded_url": "https://twitter.com/thehill/status/664581138975989761",
                    "indices": [
                        139,
                        140
                    ],
                    "url": "https://t.co/9zfkg2FixZ"
                }
            ],
            "user_mentions": [
                {
                    "id": 2517854953,
                    "id_str": "2517854953",
                    "indices": [
                        3,
                        19
                    ],
                    "name": "It'sAlwaysPolitical",
                    "screen_name": "politicspodcast"
                }
            ]
        },
        "favorite_count": 0,
        "favorited": false,
        "geo": null
}
]
}

Each item in the list "statuses" is one tweet, and there are 100 tweets returned per call.

List of items that I want to keep:

keepers_list = [tweetlist["statuses"][i]["coordinates"],
                tweetlist["statuses"][i]["created_at"],
                tweetlist["statuses"][i]["entities"]["urls"]
                ]

I am trying to do:

for item in tweetlist:
    if item not in keepers_list:
        del item

I have tried this exact code and more variations on it/different methods than I can recall, but cannot make it work. I have looked at numerous stack exchange posts on this topic, but have not been able to adapt any of them to my purpose.

I have tried using

for key in dict.iterkeys(): ...
for value in dict.itervalues(): ...
for key, value in dict.iteritems():

but I cannot make any of them work for what I want to do.

Any help, or just a push in the right direction would be greatly appreciated.

like image 901
Bill Avatar asked Nov 16 '15 23:11

Bill


2 Answers

Never delete items in a list while iterating over it, you can either

Make a copy of the list to iterate over:

for item in tweetlist[:]:
    ...

Save your desired results in another list:

keep = []
for item in tweetlist:
    if item in keepers_list:
        keep.append(item)
like image 154
Bernardo Meurer Avatar answered Oct 09 '22 20:10

Bernardo Meurer


My general rule of thumb in Python is, if I find myself using a loop, to search for a different approach. In this case, to use a dictionary comprehension, based on the original entry:

keep = {key:tweet_list[key] for key in tweet_list.keys() if key in keepers_list}

Unless the original dataset is so large that it has to be processed in place, a comprehension is generally fast and, if relatively short, self documenting enough to be easily understood.

like image 38
Greg Avatar answered Oct 09 '22 20:10

Greg