We are trying to retrieve ALL the posts, with associated comments and images, made to our group in the last year. I've tried using GraphAPI to do this but pagination means I have to get data, then copy the "next" link, and run again. Unfortunately, this means a LOT of work, since there are over 2 million posts to the group.
Does ANYONE know of a way to do this without spending a few days clicking? Also consider that the group has 4000+ members and is growing everyday, with, on average, about 1000 posts a DAY at the moment.
For the curious, the PLAN is to cull the herd... I am HOPELESS at programming and have recently started learning Python...
I made it like this, you'll probably have to iterate through all posts until data
is empty. Note this is Python 2.x version.
from facepy import GraphAPI
import json
group_id = "YOUR_GROUP_ID"
access_token = "YOUR_ACCESS_TOKEN"
graph = GraphAPI(access_token)
# https://facepy.readthedocs.org/en/latest/usage/graph-api.html
data = graph.get(group_id + "/feed", page=False, retry=3, limit=800)
with open('content.json', 'w') as outfile:
json.dump(data, outfile, indent = 4)
I've just found, and used @dfdfdf 's solution, which is great! You can generalize it to download from multiple pages of a feed, rather than just the first one, like so:
from facepy import GraphAPI
import json
group_id = "\YOUR_GROUP_ID"
access_token = "YOUR_ACCESS_TOKEN"
graph = GraphAPI(access_token)
pages = graph.get(group_id + "/feed", page=True, retry=3, limit=1000)
i = 0
for p in pages:
print 'Downloading page', i
with open('content%i.json' % i, 'w') as outfile:
json.dump(p, outfile, indent = 4)
i += 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With