Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices for logging real-time data into a NoSQL DB

I got a Java web application which receives some real-time events and pushes them to the user interface layer. I want to log all the perceived events and since the volume of information will be huge, I prefer using a NoSQL db.

I have setup a mongodb for this purpose which inserts a document per event. The problem is that this approach (a disk access per event) slows down the whole process dramatically.

So, what approaches can I take in this situation? what options are available in mongodb for this (e.g. bulk inserting, async inserting, caching, ...)? would switching to some other NoSQL db implementation make a difference? what are the best practices here?

like image 252
SJ.Jafari Avatar asked Apr 27 '16 07:04

SJ.Jafari


People also ask

What is a NoSQL database?

SQL databases only offer the familiar tables tied together with foreign keys. But with NoSQL, you can use whatever data model makes the most sense for the job at hand. You can explore relationships between data points with graph databases, or you can use key-value databases to represent data as simpler key-value pairs.

Do you need analytics in your NoSQL database?

You need some kind of analytics to understand what your employees and your customers are doing. Analytics has been part of business for a while, but with the 24/7 nature of modern business you need real-time analytics. And if you’re looking for real-time analytics, an in-Hadoop NoSQL database will help support your data needs. Why NoSQL?

Do NoSQL databases care if there is duplication of data?

NoSQL databases do not care whether there is a duplication of data because storage is not an issue with NoSQL databases. Data in NoSQL databases are typically stored in a way that is optimized for queries. This means you can store data in the same way as you would require it after performing a query

What is an example of a real-time database?

Firebase is one good example of a real-time database. Firebase is a kind of NoSQL database that allows you to store and sync data between your users in real-time. A database in real-time is a database system that utilizes real-time processing to manage workloads whose state is constantly changing.


Video Answer


1 Answers

I have waited for some time to see other answers, but lose my patience. I have used MongoDB as a log storage for 3 projects (two for Java and one for C#). Basing on this I can figure out following important rules to organize logging:

  1. Don't use indexes. If you mostly write then indexes cause performance degradation. If you need post-process log analyzes copy information to another database or collection. Unfortunately you cannot get rid of primary key _id - just leave it as is (GUID) or replace with auto-increment NumberLong.

  2. Lower write-concern. MongoDB has rich options to control awareness of write operations. You can set matching between LogLevel and writing rules. For example DEBUG, INFO, WARN can go with WriteConcern.UNACKNOWLEDGED and ERROR, FATAL can be stored with WriteConcern.ACKNOWLEDGED. Such way you improve application performance by avoiding pause during low-priority messages writing. The same time you are sure that important messages (that are seldom) placed to storage.

  3. Cache you collection instance. I mean avoid resolving Mongo's objects over getDB or getCollection each time when message arrives.

  4. Minify amount of data passed by network. Restrict your message by minimal set of fields. Truncate too long stack trace. Look how Spring 3.x shortens full name of class s.w.s.m.m.a.RequestMappingHandlerMapping instead of some.whatever.sub.main.minimal.agent.RequestMappingHandlerMapping

like image 82
Dewfy Avatar answered Sep 21 '22 23:09

Dewfy