I was investigating the Firebase Database sample for Android and realized that it stores its data in the following way: <img src="https://i.stack.imgur.com/gapwx.png" alt="enter image description here"> I am not quite familiar with NoSQL techniques and trying to understand why we have to persist each <code>post</code> entity twice - at <code>posts</code> and <code>user_posts</code> correspondingly. The documentation says that this approach is called "Fan Out" and I fully agree that it might be useful to access user's posts via simple construction like <code>databaseReference.child("user-posts").child("<user_uid>")</code>. But why do we need the <code>posts</code> node then? What if we need to update some post - do we have to do it twice? <pre class="prettyprint"><code>// [START write_fan_out] private void writeNewPost(String userId, String username, String title, String body) { // Create new post at /user-posts/$userid/$postid and at // /posts/$postid simultaneously String key = mDatabase.child("posts").push().getKey(); Post post = new Post(userId, username, title, body); Map<String, Object> postValues = post.toMap(); Map<String, Object> childUpdates = new HashMap<>(); childUpdates.put("/posts/" + key, postValues); childUpdates.put("/user-posts/" + userId + "/" + key, postValues); mDatabase.updateChildren(childUpdates); } // [END write_fan_out] </code></pre> So I wonder... when this approach might be useful and when not? Does Firebase SDK provide any tools to keep all duplicates in sync when updating or removing data? <hr> UPDATE: Here is the explanation received from Firebase team: <blockquote> the reason the posts are duplicated is because we want to be able to quickly get all the posts belonging to a user (as you suggested) and filtering from the list of all posts ever to get the posts by one user can get pretty expensive as the number of posts expands. This does mean that we have to update the post in two locations whenever we update it. It makes the code a little uglier but since queries are more common than writes it's better to optimize for reading the data. </blockquote> I suspect that this approach might look not quite elegant but it is probably the fastest option for large data sets as long as you perform SELECT more often than UPDATE. However, for some cases I'd rather stick to other solutions recommended here.

Data Fan Out is a great technique to manage massive amounts of data. If you do not use this pattern, you could have serious scaling problems in the future. What I see from your database structure, is that you are storing the whole post information twice, and that is not a good practice. You want to store just a reference to the post under another node instead. So, you will have a node named <code>users-posts</code>which will consist of user keys, and each of those keys will have a set of posts keys with value of <code>true</code>. To make it more clear: <img src="https://i.stack.imgur.com/a3yqt.png" alt="enter image description here"> This way, you are tracking which posts the user has written under the <code>users-posts</code> node; and also the user that has written each post under the <code>posts</code> node. Now, you may need to get a list of all users' posts. What you would have to do is to synchronize on the <code>users-posts/USER_KEY/</code> node to get the keys for all the posts that the user has written, and then get more post information using the post key you just got. Why is this database design recommended? Because you are getting much less information for each synchronization (with Firebase we are not issuing requests per-se, so I call the reading a synchronization). In your example, if you attach a listener to the <code>user-posts/USER_KEY/</code> to get a list of all posts, you will also ask for ALL the information of EACH AND EVERY post they have written. With the data fan out approach you can just ask for the post information you need because you already have the key of the posts.

In my opinion this is not a good approach since you need to keep in sync those data and Firebase doesn't provide any tool to keep duplicates in sync. A good approach would be to store only the key in <code>user-posts</code>. I suggest reading this, it is very interesting to understand how to structure data: https://www.firebase.com/docs/web/guide/structuring-data.html

Firebase Database - the "Fan Out" technique

Tags:

java

android

nosql

firebase

firebase-realtime-database

I was investigating the Firebase Database sample for Android and realized that it stores its data in the following way:

enter image description here

I am not quite familiar with NoSQL techniques and trying to understand why we have to persist each post entity twice - at posts and user_posts correspondingly. The documentation says that this approach is called "Fan Out" and I fully agree that it might be useful to access user's posts via simple construction like databaseReference.child("user-posts").child("<user_uid>"). But why do we need the posts node then? What if we need to update some post - do we have to do it twice?

// [START write_fan_out]
private void writeNewPost(String userId, String username, String title, String body) {
    // Create new post at /user-posts/$userid/$postid and at
    // /posts/$postid simultaneously
    String key = mDatabase.child("posts").push().getKey();
    Post post = new Post(userId, username, title, body);
    Map<String, Object> postValues = post.toMap();

    Map<String, Object> childUpdates = new HashMap<>();
    childUpdates.put("/posts/" + key, postValues);
    childUpdates.put("/user-posts/" + userId + "/" + key, postValues);

    mDatabase.updateChildren(childUpdates);
}
// [END write_fan_out]

So I wonder... when this approach might be useful and when not? Does Firebase SDK provide any tools to keep all duplicates in sync when updating or removing data?

UPDATE: Here is the explanation received from Firebase team:

the reason the posts are duplicated is because we want to be able to quickly get all the posts belonging to a user (as you suggested) and filtering from the list of all posts ever to get the posts by one user can get pretty expensive as the number of posts expands.

This does mean that we have to update the post in two locations whenever we update it. It makes the code a little uglier but since queries are more common than writes it's better to optimize for reading the data.

I suspect that this approach might look not quite elegant but it is probably the fastest option for large data sets as long as you perform SELECT more often than UPDATE. However, for some cases I'd rather stick to other solutions recommended here.

372

asked Jul 04 '16 10:07

fraggjkee

2 Answers

Data Fan Out is a great technique to manage massive amounts of data. If you do not use this pattern, you could have serious scaling problems in the future.

What I see from your database structure, is that you are storing the whole post information twice, and that is not a good practice. You want to store just a reference to the post under another node instead. So, you will have a node named users-postswhich will consist of user keys, and each of those keys will have a set of posts keys with value of true. To make it more clear:

enter image description here

This way, you are tracking which posts the user has written under the users-posts node; and also the user that has written each post under the posts node. Now, you may need to get a list of all users' posts. What you would have to do is to synchronize on the users-posts/USER_KEY/ node to get the keys for all the posts that the user has written, and then get more post information using the post key you just got.

Why is this database design recommended? Because you are getting much less information for each synchronization (with Firebase we are not issuing requests per-se, so I call the reading a synchronization). In your example, if you attach a listener to the user-posts/USER_KEY/ to get a list of all posts, you will also ask for ALL the information of EACH AND EVERY post they have written. With the data fan out approach you can just ask for the post information you need because you already have the key of the posts.

125

answered Oct 30 '22 10:10

david-ojeda

In my opinion this is not a good approach since you need to keep in sync those data and Firebase doesn't provide any tool to keep duplicates in sync. A good approach would be to store only the key in user-posts.

I suggest reading this, it is very interesting to understand how to structure data: https://www.firebase.com/docs/web/guide/structuring-data.html

answered Oct 30 '22 10:10

Devid Farinelli

Related questions
                            
                                Java string split gives different outputs on Windows and linux
                            
                                Java's compiler not retaining generic method annotations?
                            
                                java 8 time api - Instant.now(clock) vs LocaldateTime.now(clock)
                            
                                JPA Criteria Query - How to Avoiding Duplicate Joins
                            
                                How do I release file system locks after cloning repo via JGit
                            
                                How to escape + character in java?
                            
                                How to debounce a retrofit reactive request in java?
                            
                                Dynamically injecting instances via CDI
                            
                                How to properly set up Java/Selenium configuration to run automated tests?
                            
                                Removing logback `INFO` messages from Maven Console for junit tests
                            
                                Apache Http Client prints "[read] I/O error: Read timed out""
                            
                                Spring Boot with datasource when testing
                            
                                How can I change a property in spring environment?
                            
                                Default keep-alive time for a HttpConnection when using Spring Rest Template
                            
                                Use instanceof without knowing the type
                            
                                Error handling practices in spring integration flow
                            
                                Configure Spring security for Ldap connection
                            
                                How to set typeIdPropertyName in MappingJackson2MessageConverter
                            
                                Gradle, rt.jar access restriction
                            
                                I can't autowire repository in spring

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With