Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NoSQL or RDBMS for audit data

I know that similar questions were asked in the subject, but I still haven't seen anyone that completely contained all my requests.

I would start by saying that I only have experience in RDBMS's so I'm sorry if I get anything regarding NoSQL wrong.

I'm creating a database that would hold a large amount of audit logs (about 1TB).

I'm using it for:

  1. Fast data writing (a massive amount of audit logs is written all the time)

  2. Search - search over the audit data (search actions performed by a certain user, at a certain time or a certain action... the database should support searching any of the 'columns' very quickly)

  3. Analytics & Reporting - Generate daily, weekly, monthly reports of the data (They are predefined at the moment.. if they are more dynamic, does it affect the solution I should choose?)

Reliability (support for fail-over or any similar feature), Scalability (If I grow above 1TB to 2TB, 10TB or 100TB - does any of the solutions can't support this amount of data?) and of course Performance (in the use cases I specified) are very important to me.

I know RDBMS and that would be my easy way of starting, but I'm really concerned that after a while, the DB would simply not keep up with the pace.

My question is should I pick an RDBMS or NoSQL solution and why? If a NoSQL solution, since they are so different, which of them do you think fits my needs?

like image 803
Roy Reznik Avatar asked Mar 14 '13 09:03

Roy Reznik


People also ask

When should I use RDBMS or NoSQL?

Data Model: RDBMS databases are used for normalized structured (tabular) data strictly adhering to a relational schema. NoSQL datastores are used for non-relational data, e.g. key-value, document tree, graph.

Does audit use SQL?

An audit can be created either by using SQL Server Management Studio, by using transact SQL or SQL Server Management Objects (SMO). In SQL Server Management Studio an audit can be created under Audit node which resides under the Security node in the Object Explorer.

Is SQL or NoSQL better for analytics?

NoSQL seems to work better on both unstructured and unrelated data. The better solutions are the crossover databases that have elements of both NoSQL and SQL. RDBMSs that use SQL are schema–oriented which means the structure of the data should be known in advance to ensure that the data adheres to the schema.

Why NoSQL is not good for transactions?

They lack support for complex queries such as joins across tables. While relational databases rely heavily on normalization and referential integrity, NoSQL databases are not strictly normalized. Generally, NoSQL databases do not implement multi-key transactions.


1 Answers

Generally there isn't a right or wrong answer here.

Fast data writing, either solution will be ok, although you didn't say what volume per second you are storing. Both solutions have things to watch out for.

Search (very quick) over all columns. For smaller volumes, say few hundred Gb, then either solution will be Ok (assuming skilled people put it together). You didn't actually say how fast/often you search, so if it is many times per minute this consideration becomes more important. Fast search can often slow down ability to write high volumes quickly as indexes required for search need to be updated.

Audit records typically have a time component, so searching that is time constrained, eg within last 7 days, will significantly speed up search times compared to search all records.

Reporting. When you get up to 100Tb, you are going to need some real tricks, or a big budget, to get fast reporting. For static reporting, you will probably end up creating one program that generates multiple reports at once to save I/O. Dynamic reports will be the tricky one.

My opinion? Since you know RDBMS, I would start with that as a method and ship the solution. This buys you time to learn the real problems you will encounter (the no premature optimization that many on SO are keen on). During this initial timeframe you can start to select nosql solutions and learn them. I am assuming here that you want to run your own hardware/database, if you want to use cloud type solutions, then go to them straight away.

like image 147
rlb Avatar answered Oct 18 '22 06:10

rlb