I'm building an app whose DB system will be crucial, and needs to be scalable since all its value will be in the data. I'm making a live voting system. I am comfortable with SQL and MongoDB, so it's almost not a factor of decision (though I tend to like MongoDB structure and JS more these times :) ) But from everything I've read on the web, I still feel uncomfortable with my decision. What I want to do is to combine the advantage of both: <ul> <li>Having noSQL Documents for objects (Users, Items, Comments etc.)</li> <li>Having SQL tables for relationships (table User-Items, User-Comments etc.)</li> <li>Duplicating the vote results in a noSQL document whenever there is a vote or in a regular interval (to gain speed also on vote results display)</li> </ul> Great advantages I see are: <ol> <li>When querying a document (eg. a user to display his profile), I have all the NoSQL benefits (speed, all in one place, schema flexibility etc.)</li> <li>When doing stats (eg. number of vote), I have all the SQL benefits</li> <li>Parallelisation: I can fetch the vote in SQL and the documents in aSync mode</li> <li>Read fast, write slowish (and it doesn't matter in my case)</li> <li>Relationship integrity is always preserved</li> </ol> My questions are : <ul> <li>Is it a good practice to do so ? The web seems pretty shy about it</li> <li>Am I optimizing peanuts, even with high DB load ? (comparing document fetching to full SQL and queries like select * from table where primary_key = XXX)</li> </ul>

7 years later I'm stepping at my own question, and feel like I could now help the past me. Today, I would go for PostgreSQL JSON types. This allows to still have tables, relationships and indexes, which are great for comprehension and atomicity, along with extendable fields in the <code>users</code> table, like a <code>identity</code> field that would look like: <pre class="prettyprint"><code>identity { firstName: "John", lastName: "Doe", address: "5 example st", postCode: "XXX", city: "Example city" } </code></pre> which is queryable like: <code>select * from users u where u.identity ->> lastName = 'Doe'</code> (not 100% sure of the syntax). Though this may be quite surprising at first, it works very well. The best is when the ORM includes support of those types out of the box, like Ecto, Active Record etc.

"Best practice" is a horrible term - it is often used to justify gut instinct, "this is how we've always done it", or other prejudice. However, the solution you describe has a bunch of benefits (you mention a few), but also some significant drawbacks, mostly because you are splitting knowledge of your problem domain among two incompatible data stores, and this opens up lots of opportunities for duplication - but also for inconsistency. For instance, the knowledge that a given user is identified by a certain identifier would be shared between your NoSQL system and your database. If one system deletes that user, the other is left in an inconsistent state. A given user's profile would be split across two systems, and neither would have a complete picture; you'd need lots of housekeeping synchronization code. Developers working on your platform would need expertise in both technology stacks - imagine trying to debug why a given user's comment count appears to be incorrect. You now have two points of failure - if either the NoSQL or SQL databases fail, your entire system breaks. And failure may not mean crashing - it may also mean performance issues, or problems with upgrades, or problems with backups. It's not uncommon for software solutions to have multiple systems each owning a part of the data, the split is usually along business domain lines (the CRM system knows your profile, the payment system your credit card details, the ecommerce system knows what you ordered); splitting the division along technical lines would create a complex architecture with multiple points of failure. I don't think the benefits outweigh those drawbacks.

Hybrid DB System: NoSQL for data, SQL for relationships. Best Practice? [closed]

I'm building an app whose DB system will be crucial, and needs to be scalable since all its value will be in the data.

I'm making a live voting system.

I am comfortable with SQL and MongoDB, so it's almost not a factor of decision (though I tend to like MongoDB structure and JS more these times :) )

But from everything I've read on the web, I still feel uncomfortable with my decision.

What I want to do is to combine the advantage of both:

Having noSQL Documents for objects (Users, Items, Comments etc.)
Having SQL tables for relationships (table User-Items, User-Comments etc.)
Duplicating the vote results in a noSQL document whenever there is a vote or in a regular interval (to gain speed also on vote results display)

Great advantages I see are:

When querying a document (eg. a user to display his profile), I have all the NoSQL benefits (speed, all in one place, schema flexibility etc.)
When doing stats (eg. number of vote), I have all the SQL benefits
Parallelisation: I can fetch the vote in SQL and the documents in aSync mode
Read fast, write slowish (and it doesn't matter in my case)
Relationship integrity is always preserved

My questions are :

Is it a good practice to do so ? The web seems pretty shy about it
Am I optimizing peanuts, even with high DB load ? (comparing document fetching to full SQL and queries like select * from table where primary_key = XXX)

Which type of NoSQL database works best for modeling complex relationships between data?

GRAPH DATABASES The last big NoSQL database type is the most complex one, geared toward storing relations between entities in an efficient manner.

Which type of NoSQL database is used to track entity relationships?

NoSQL databases come in a variety of types including document databases, key-values databases, wide-column stores, and graph databases. MongoDB is the world's most popular NoSQL database.

Which NoSQL database does not support relations or joins?

NoSQL == No JOIN? Document-oriented databases such as MongoDB are designed to store denormalized data. Ideally, there should be no relationship between collections. If the same data is required in two or more documents, it must be repeated.

7 years later I'm stepping at my own question, and feel like I could now help the past me.

Today, I would go for PostgreSQL JSON types.

This allows to still have tables, relationships and indexes, which are great for comprehension and atomicity, along with extendable fields in the users table, like a identity field that would look like:

identity {
  firstName: "John",
  lastName: "Doe",
  address: "5 example st",
  postCode: "XXX",
  city: "Example city"
}

which is queryable like: select * from users u where u.identity ->> lastName = 'Doe' (not 100% sure of the syntax).

Though this may be quite surprising at first, it works very well. The best is when the ORM includes support of those types out of the box, like Ecto, Active Record etc.

"Best practice" is a horrible term - it is often used to justify gut instinct, "this is how we've always done it", or other prejudice.

However, the solution you describe has a bunch of benefits (you mention a few), but also some significant drawbacks, mostly because you are splitting knowledge of your problem domain among two incompatible data stores, and this opens up lots of opportunities for duplication - but also for inconsistency.

For instance, the knowledge that a given user is identified by a certain identifier would be shared between your NoSQL system and your database. If one system deletes that user, the other is left in an inconsistent state. A given user's profile would be split across two systems, and neither would have a complete picture; you'd need lots of housekeeping synchronization code.

Developers working on your platform would need expertise in both technology stacks - imagine trying to debug why a given user's comment count appears to be incorrect.

You now have two points of failure - if either the NoSQL or SQL databases fail, your entire system breaks. And failure may not mean crashing - it may also mean performance issues, or problems with upgrades, or problems with backups.

It's not uncommon for software solutions to have multiple systems each owning a part of the data, the split is usually along business domain lines (the CRM system knows your profile, the payment system your credit card details, the ecommerce system knows what you ordered); splitting the division along technical lines would create a complex architecture with multiple points of failure.

I don't think the benefits outweigh those drawbacks.

Hybrid DB System: NoSQL for data, SQL for relationships. Best Practice? [closed]

Tags:

database

database-design

nosql

Augustin Riedinger

People also ask

2 Answers

Augustin Riedinger

Neville Kuyt

Recent Activity

Donate For Us

Hybrid DB System: NoSQL for data, SQL for relationships. Best Practice? [closed]

Tags:

database

database-design

nosql

Augustin Riedinger

People also ask

2 Answers

Augustin Riedinger

Neville Kuyt

Related questions

Recent Activity

Donate For Us