Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Since vs updated_time vs created_time

I'm writing an app to collect facebook posts matching a certain search term, and I'm trying to fetch only new or updated posts since the graph.facebook.com/search endpoint. I've concluded from debugging that this particular endpoint uses time-based pagination (since, until), so here's my process:

  1. fetch new posts using the most recent 'since' time (default to now - 5 mins at start)
  2. update my 'since' time to the most recent created_time or updated_time from the list of return posts
  3. sleep X seconds, repeat

However, I can't even see my own newly created posts. I do get some results, but they seem random in terms of why they match my search and not my own. For testing purposes, I'm using a user-level access token generated using the FB developer tools, so I should definitely not have any permissions issues restricting me from seeing my own content.

What gives?

Edit: More testing reveals that I can randomly receive SOME of my own posts, but there appears to be no rhyme or reason why one post shows up and the others don't. For example, I just posted 3 posts and received the second one via my app. The first and third are nowhere to be found.

like image 841
Chris Avatar asked Feb 15 '26 08:02

Chris


1 Answers

I think what you are seeing here is an artefact of the consistency model Facebook is using. You can see another example of this when you look at your feed from two different devices. If I look at my feed from my smartphone and then go and check out my feed on my PC, sometimes I see the same items and sometimes there are items I saw on one device, that I didn't see on the other. This is because Facebook uses Eventual consistency.

In simple terms this means that given enough time, all data clusters will be consistent, but this is not guaranteed at any given time point. The bad news is: there is not much you can do about this. It's just a fact-of-life when working with very large distributed systems (and Facebook is one of the largest in the world). At this scale it is just not practical, where technology is today, to keep all copies of the data completely in sync at all times. What I think you are seeing is two requests serviced by two clusters which are currently not 100% in sync.

Here is an interesting read on the subject. And here is something from Facebook. You can skip to the Consistancy section of the page (Although, I would recommend reading the entire post. It is a very interesting overview of Facebook architecture).

like image 102
Elad Lachmi Avatar answered Feb 21 '26 15:02

Elad Lachmi