Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL vs NoSQL for an inventory management system

I am developing a JAVA based web application. The primary aim is to have inventory for products being sold on multiple websites called channels. We will act as manager for all these channels. What we need is:

  1. Queues to manage inventory updates for each channel.
  2. Inventory table which has a correct snapshot of allocation on each channel.
  3. Keeping Session Ids and other fast access data in a cache.
  4. Providing a facebook like dashboard(XMPP) to keep the seller updated asap.

The solutions i am looking at are postgres(our db till now in a synchronous replication mode), NoSQL solutions like Cassandra, Redis, CouchDB and MongoDB.

My constraints are:

  1. Inventory updates cannot be lost.
  2. Job Queues should be executed in order and preferably never lost.
  3. Easy/Fast development and future maintenance.

I am open to any suggestions. thanks in advance.

like image 659
gladiator Avatar asked Nov 30 '11 07:11

gladiator


2 Answers

  1. Queues to manage inventory updates for each channel.

This is not necessarily a database issue. You might be better off looking at a messaging system(e.g. RabbitMQ)

  1. Inventory table which has a correct snapshot of allocation on each channel.
  2. Keeping Session Ids and other fast access data in a cache.

session data should probably be put in a separate database more suitable for the task(e.g. memcached, redis, etc) There is no one-size-fits-all DB

  1. Providing a facebook like dashboard(XMPP) to keep the seller updated asap.

My constraints are: 1. Inventory updates cannot be lost.

There are 3 ways to answer this question:

  1. This feature must be provided by your application. The database can guarantee that a bad record is rejected and rolled back, but not guarantee that every query will get entered. The app will have to be smart enough to recognize when an error happens and try again.

  2. some DBs store records in memory and then flush memory to disk peridocally, this could lead to data loss in the case of a power failure. (e.g Mongo works this way by default unless you enable journaling. CouchDB always appends to the records(even a delete is a flag appended to the record so data loss is extremely difficult))

  3. Some DBs are designed to be extremely reliable, even if an earthquake, hurricane or other natural disaster strikes, they remain durable. these include Cassandra, Hbase, Riak, Hadoop, etc

Which type of durability are your referring to?

  1. Job Queues should be executed in order and preferably never lost.

Most noSQL solutions prefer to run in parallel. so you have two options here. 1. use a DB that locks the entire table for every query(slower) 2. build your app to be smarter or evented(client side sequential queuing)

  1. Easy/Fast development and future maintenance.

generally, you will find that SQL is faster to develop at first, but changes can be harder to implement noSQL may require a little more planning, but is easier to do ad hoc queries or schema changes.

The questions you probably need to ask yourself are more like:

  1. "Will I need to have intense queries or deep analysis that a Map/Reduce is better suited to?"

  2. "will I need to my change my schema frequently?

  3. "is my data highly relational? in what way?"

  4. "does the vendor behind my chosen DB have enough experience to help me when I need it?"

  5. "will I need special feature such as GeoSpatial indexing, full text search, etc?"

  6. "how close to realtime will I need my data? will it hurt if I don't see the latest records show up in my queries until 1sec later? what level of latency is acceptable?"

  7. "what do I really need in terms of fail-over"

  8. "how big is my data? will it fit in memory? will it fit on one computer? is each individual record large or small?

  9. "how often will my data change? is this an archive?"

If you are going to have multiple customers(channels?) each with their own inventory schemas, a document based DB might have it's advantages. I remember one time I looked at an ecommerce system with inventory and it had almost 235 tables! Then again, if you have certain relational data, a SQL solution can really have some advantages too.

I can certainly see how I could build a solution using mongo, couch, riak or orientdb with the given constraints. But as for which is the best? I would try talking directly DB vendors, and maybe watch the nosql tapes

like image 120
BenG Avatar answered Sep 20 '22 15:09

BenG


Addressing your constraints:

  1. Most NoSQL solutions give you a configurable tradeoff of consistency vs. performance. In MongoDB, for instance, you can decide how durable a write should be. If you want to, you can force the write to be fsync'ed on all your replica set servers. On the other extreme, you can choose to send the command and don't even wait for the server's response.

  2. Executing job queues in order seems to be an application code issue. I'd say a timestamp in the db and an order by type of query should do for most applications. If you have multiple application servers and your queues need to be perfect, you'd have to use a truly distributed algorithm that provides ordering, but that is not a typical requirement, and it's very tricky indeed.

  3. We've been using MongoDB for some time now, and I'm convinced this gives your app development speed a real boost. There's no big difference in maintenance, maintaining data is a pain either way. Not having a schema gives you added flexibility (lazy migrations), but it's more elaborate and requires some care.

In summary, I'd say you can do it both ways. The NoSQL is more code driven, and transactions and relational integrity are mostly managed by your code. If you're uncomfortable with that, go for a relational DB.

However, if you're data grows huge, you'll have to code some of this logic manually because you probably wouldn't want to do real-time joins on a 10B row database. Still, you can implement that with SQL as well.

A good way to find the boundary for different databases is to consider what you can cache. Data that can be cached and reconstructed at any time are a great way to start introducing a new layer, because there's no big risks there. Also, cached data usually doesn't keep any relations so you're not sacrificing any consistency here.

like image 30
mnemosyn Avatar answered Sep 17 '22 15:09

mnemosyn