I have a specific problem with many to many relationship implementations in MongoDB. <blockquote> I have collections of Songs and Artists(Millions document). Here the song can be sung by Many Artists and an artist can sing Many songs. So I followed the approach of Document referencing in both collections. like this... </blockquote> 1. Songs Collection:- <pre class="prettyprint"><code>{ _id:ObjectId("dge547567hheheasfw3454dfg"), title:"xyz", artists:[ObjectId("xfvdg464654"), ...] //many artists // artists ids } </code></pre> 2. Artist Collection:- <pre class="prettyprint"><code>{ _id:ObjectId("dge547567hheheasfw3454dfg"), title:"xyz", songs:[ObjectId("xfvdg464654"), ...] //many songs // songs Ids } </code></pre> <blockquote> But here the problem is while deleting artist I have to delete an artist from an array of the artist in all document of the song which has an artist and vice versa. Which can cause the problem of Atomicity. How can I ensure atomicity here? Secondly when the database will grow and songs are sung by the artist will increase thus resulting document growth of both collection and document size can reach to 16MB or greater(MAX DOC SIZE). </blockquote> So what can be done here in this case?

Let's start with detailing our Many-to-Many relationship for your case and try to understand what can be and cannot be done - <ul> <li> A song can be sung by as many as 10 or maybe 20 artists (assuming it isn't as complex/diverse to require maybe 100's of artists). In this case, bucketing artist's id inside of <code>songs</code> collection is perfectly fine and we can safely assume that even in the worst case scenario (storing a complex/diverse song sung by 100 artist) it will never force our song collection beyond 16 MBs. </li> <li>An artist, however may very well sing upto 1000s of songs or maybe more in his enitre career. An ObjectId being 12 bytes long, in this case will grow a collection to a size of merely 12000 bytes which is way lesser than 16000000 bytes. You are still left with a lot of space. So no need to worry about hitting the cap of 16MB.</li> </ul> Approach - 1 <blockquote> Inter-bucketing works really well for relations expecting high reads. </blockquote> Songs for some artists can be fetched in single query and even vice versa. And this would be even smoother with indexes sprinkled over these two collections. But if we go around bucketing artists inside of songs and songs inside of artists then our updates are no more Atomic but for that we can still implement an application level Two-phase commit for both artists and songs CRUD, which even after being a little troublesome, solves the problem. Approach - 2: <blockquote> Why not bucket only artist id's inside of songs collection and have multikey index on that field. </blockquote> List of artists who sang a song is way too short than the list of songs sung by an artist. So we only bucket artists inside of songs collection. This way we will - 1. avoid the near to impossible possibility of hitting maximum size of artist's collection if we had bucketed songs inside of artists collection. 2. avoid writing 2P commits for atleast <code>songs</code> collections. All relational reads can be satisfied via songs collection only (here i am excluding the _id lookup for artist) 3. ensure fast data access in just a single query even when reverse querying on song collection for songs sung by an artist. You will already be having some info(_id) of artist for which you need to fetch songs. You just draft a query like this - <pre class="prettyprint"><code> db.songs.find({ artists: 'your-artist-id' }); </code></pre> And when you explain this query, you find happiness when you realize that it utilizes your multi-key indexes. Great job there ! Now which approach to go for ? I find the second approach a little more subtle for your use case as it reduces some of the complexity of managing 2P commits for atomicity and still provides a good read performance. First approach definitely is reads oriented so if you're sure that you will be recieving a lots and lots of reads on both the collection, go for first one otherwise second one should do the trick.

How to handle Many to Many relationship in mongoDB?

Tags:

mongodb

schema

mongoose

entity-relationship

I have a specific problem with many to many relationship implementations in MongoDB.

I have collections of Songs and Artists(Millions document). Here the song can be sung by Many Artists and an artist can sing Many songs. So I followed the approach of Document referencing in both collections. like this...

1. Songs Collection:-

{
  _id:ObjectId("dge547567hheheasfw3454dfg"),
   title:"xyz",
   artists:[ObjectId("xfvdg464654"), ...] //many artists // artists ids
}

2. Artist Collection:-

{
  _id:ObjectId("dge547567hheheasfw3454dfg"),
   title:"xyz",
   songs:[ObjectId("xfvdg464654"), ...] //many songs // songs Ids 
}

But here the problem is while deleting artist I have to delete an artist from an array of the artist in all document of the song which has an artist and vice versa. Which can cause the problem of Atomicity. How can I ensure atomicity here?

Secondly when the database will grow and songs are sung by the artist will increase thus resulting document growth of both collection and document size can reach to 16MB or greater(MAX DOC SIZE).

So what can be done here in this case?

337

asked Apr 07 '18 19:04

Abhishek Singh

2 Answers

Let's start with detailing our Many-to-Many relationship for your case and try to understand what can be and cannot be done -

A song can be sung by as many as 10 or maybe 20 artists (assuming it isn't as complex/diverse to require maybe 100's of artists).

In this case, bucketing artist's id inside of songs collection is perfectly fine and we can safely assume that even in the worst case scenario (storing a complex/diverse song sung by 100 artist) it will never force our song collection beyond 16 MBs.
An artist, however may very well sing upto 1000s of songs or maybe more in his enitre career. An ObjectId being 12 bytes long, in this case will grow a collection to a size of merely 12000 bytes which is way lesser than 16000000 bytes. You are still left with a lot of space. So no need to worry about hitting the cap of 16MB.

Approach - 1

Inter-bucketing works really well for relations expecting high reads.

Songs for some artists can be fetched in single query and even vice versa. And this would be even smoother with indexes sprinkled over these two collections.

But if we go around bucketing artists inside of songs and songs inside of artists then our updates are no more Atomic but for that we can still implement an application level Two-phase commit for both artists and songs CRUD, which even after being a little troublesome, solves the problem.

Approach - 2:

Why not bucket only artist id's inside of songs collection and have multikey index on that field.

List of artists who sang a song is way too short than the list of songs sung by an artist. So we only bucket artists inside of songs collection.

This way we will -

1. avoid the near to impossible possibility of hitting maximum size of artist's collection if we had bucketed songs inside of artists collection.

2. avoid writing 2P commits for atleast songs collections. All relational reads can be satisfied via songs collection only (here i am excluding the _id lookup for artist)

3. ensure fast data access in just a single query even when reverse querying on song collection for songs sung by an artist.

You will already be having some info(_id) of artist for which you need to fetch songs. You just draft a query like this -

 db.songs.find({ artists: 'your-artist-id' });

And when you explain this query, you find happiness when you realize that it utilizes your multi-key indexes. Great job there !

Now which approach to go for ?

I find the second approach a little more subtle for your use case as it reduces some of the complexity of managing 2P commits for atomicity and still provides a good read performance. First approach definitely is reads oriented so if you're sure that you will be recieving a lots and lots of reads on both the collection, go for first one otherwise second one should do the trick.

189

answered Oct 26 '22 03:10

Himanshu Singh

I implemented many to many relationship in mongodb by taking third collection similar to what we do in sql.

Song Collection

{
  _id:ObjectId("dge547567hheheasfw3454df12"),
   title:"xyz",
   length : 123
}

Artist Collection

{
   _id:ObjectId("dge547567hheheasfw3454d32"),
   name:"abc",
}

SongArtist Collection

{
   _id:ObjectId("dge547567hheheasdfsdfsdfgdfga42"),
   artist: ObjectId("dge547567hheheasfw3454dfg32"),
   song: ObjectId("dge547567hheheasfw3454df12"),
}

Now when you do crud operations and if you want to delete the artist from a song you can do it in single query in SongArtist Collection.
It will never have any problem for exceeding document size
If you want to delete particular artist in particular song you have to query once
It will increase the number of records in a collection but mongodb can handle that very easily.
You can find all the song related to one artist in single query and vice versa.

answered Oct 26 '22 03:10

Vikash Dahiya

Related questions
                            
                                Node.js File Upload (Express 4, MongoDB, GridFS, GridFS-Stream)
                            
                                mongodb set field value using if else during document update
                            
                                is sharding same as distributed database in mongoDB? [closed]
                            
                                docker backup and restore mongodb
                            
                                excluding from $match if data is blank or null in mongodb
                            
                                Check if values matchs in MongoDB [duplicate]
                            
                                MongoDB C# driver timeout
                            
                                Get value from Embedded Document Mongo Java
                            
                                MongoDB: Error parsing command line: unrecognised option '--sslMode'
                            
                                how can I migrate the database dump of MongoDB into PostgreSQL?
                            
                                Consequences of a COLLSCAN in MongoDB?
                            
                                Clear all entries in Docker mongodb?
                            
                                Mongoose Aggregate with $unwind before $group
                            
                                How to use mongorestore/mongodump with username and password?
                            
                                MongoDB's $graphLookup trying to get a tree structure
                            
                                How to post a dynamic JSON property to a C# ASP.Net Core Web API using MongoDB?
                            
                                How could we make use of ClusterListener in Mongo?
                            
                                Mongodb - How to debug aggregation steps?
                            
                                How to truncate a number to 3 decimals
                            
                                How can I log actual queries to MongoDB with mongo java driver

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With