Firefeed is a very nice example of what can be achieved with Firebase - a fully client side Twitter clone. So there is this page : https://firefeed.io/about.html where the logic behind the adopted data structure is explained. It helps a lot to understand Firebase security rules.
By the end of the demo, there is this snippet of code :
var userid = info.id; // info is from the login() call earlier.
var sparkRef = firebase.child("sparks").push();
var sparkRefId = sparkRef.name();
// Add spark to global list.
sparkRef.set(spark);
// Add spark ID to user's list of posted sparks.
var currentUser = firebase.child("users").child(userid);
currentUser.child("sparks").child(sparkRefId).set(true);
// Add spark ID to the feed of everyone following this user.
currentUser.child("followers").once("value", function(list) {
list.forEach(function(follower) {
var childRef = firebase.child("users").child(follower.name());
childRef.child("feed").child(sparkRefId).set(true);
});
});
It's showing how the writing is done in order to keep the read simple - as stated :
When we need to display the feed for a particular user, we only need to look in a single place
So I do understand that. But if we take a look at Twitter, we can see that some accounts has several millions followers (most followed is Katy Perry with over 61 millions !). What would happen with this structure and this approach ? Whenever Katy would post a new tweet, it would make 61 millions Write operations. Wouldn't this simply kill the app ? And even more, isn't it consuming a lot of unnecessary space ?
All Firebase Realtime Database data is stored as JSON objects. You can think of the database as a cloud-hosted JSON tree. Unlike a SQL database, there are no tables or records. When you add data to the JSON tree, it becomes a node in the existing JSON structure with an associated key.
Unlike the Firebase realtime database is a single-rooted JSON format database, Firebase Cloud Firestore stores data in hierarchical data structures. The Firebase Cloud Firestore store user data in the document organized into collections. The document contains the hierarchical data structure of complex JSON objects.
It is a relational database management system (RDMS) based on the domain-specific programming language Structured Query Language (SQL).
With denormalized data, the only way to connect data is to write to every location its read from. So yeah, to publish a tweet to 61 million followers would require 61 million writes.
You wouldn't do this in the browser. The server would listen for child_added
events for new tweets, and then a cluster of workers would split up the load paginating a subset of followers at a time. You could potentially prioritize online users to get writes first.
With normalized data, you write the tweet once, but pay for the join on reads. If you cache the tweets in feeds to avoid hitting the database for each request, you're back to 61 million writes to redis for every Katy Perry tweet. To push the tweet in real time, you need to write the tweet to a socket for every online follower anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With