Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is MongoDB usable as shared memory for a parallell processing / multiple-instances application?

I'm planning a product that will process updates from multiple data feeds. Input-data is guesstimated to be a total of 100Mbps stream containing 100 byte sized messages. These messages contain several data fields that needs to be checked for correlation with the existing data set within the application. If a input-message correlates with an existing data record, then the input-message will update the existing data-record, if not: it will create a new record. It is assumed that data are updated every 3 seconds in average.

The correlation process is assumed to be a bottleneck, and thus I intend to make our product able to run balanced in multiple processes if needed (most likely on a separate hardware or VM). Somewhat in the vicinity of Space-based architecture. I'd then like a shared storage between my processes so that all existing data records are visible to all the running processes. The shared storage will have to fetch possible candidates for correlation through a query/search based on some attributes (e.g. elevation). It will have to offer configuring warm redundancy, and a possibility to store snapshots every 5 minutes for logging.

Everything seems to be pointing towards MongoDB, but I'd like a confirmation from you that MongoDB will meet my needs. So do you think it is a go? -Thank you

NB: I am not considering a relational database because we want to focus all coding in our application, instead of having to make 'stored procedures'/'functions' in a separate environment to optimize the performance of our system. Further, the data is diverse and I don't want to try normalize it into a schema.

like image 322
Lotharlux Avatar asked Oct 22 '22 16:10

Lotharlux


1 Answers

Yes, MongoDB will meet your needs. I think the following aspects of your description are particularly relevant in your DB selection decision:

1. An update happens every 3 seconds

MongoDB has a database level write-lock (usually short lived) that blocks read operations. This means that you want will want to ensure that you have enough memory to fit your working set, and you will generally not run into any write-lock issues. Note that bulk inserts will hold the write lock for longer.

If you are sharding, you will want to consider shard keys that allow for write scaling i.e. distribute writes on different shards.

2. Shared storage for multiple processes

This is a pretty common scenario; in fact, many MongoDB deployments are expected be accessed from multiple processes concurrently. Unlike the write-lock, the read-lock does not block other reads.

3. Warm redundancy

Supported through MongoDB replication. If you'd like to read from secondary server(s) you will need to set the Read Preference to secondaryPreferred in your driver.

like image 162
Zaid Masud Avatar answered Oct 27 '22 17:10

Zaid Masud