Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implement a ring buffer

We have a table logging data. It is logging at say 15K rows per second.

Question: How would we limit the table size to the 1bn newest rows?

i.e. once 1bn rows is reached, it becomes a ring buffer, deleting the oldest row when adding the newest.

Triggers might load the system too much. Here's a trigger example on SO. We are already using a bunch of tweaks to keep the speed up (such as stored procedures, Table Parameters etc).

Edit (8 years on) :

My recent question/answer here addresses a similar issue using a time series database.

like image 463
Nick T Avatar asked Mar 28 '14 11:03

Nick T


People also ask

How are circular buffers implemented?

Circular Buffers can be implemented in two ways, using an array or a linked list. An empty object array along with its capacity is initialized inside the constructor as the type of elements added is unknown. Two pointers namely head and tail are maintained for insertion and deletion of elements.

How do ring buffers work?

Circular buffers (also known as ring buffers) are fixed-size buffers that work as if the memory is contiguous & circular in nature. As memory is generated and consumed, data does not need to be reshuffled – rather, the head/tail pointers are adjusted. When data is added, the head pointer advances.

What is a ring buffer Python?

A ring buffer is a buffer with a fixed size. When it fills up, adding another element overwrites the oldest one that was still being kept. It's particularly useful for the storage of log information and history. There is no direct support in Python for this kind of structure, but it's easy to construct one.

Is circular buffer the same as ring buffer?

A ring buffer (also known as a circular buffer or a circular queue) is a buffer data structure that behaves as if it had a circular shape, in which the last element in the buffer is connected to the first element. Like standard buffers, ring buffers typically have a fixed size.


1 Answers

Unless there is something magic about 1 billion, I think you should consider other approaches.

The first that comes to mind is partitioning the data. Say, put one hour's worth of data into each partition. This will result in about 15,000*60*60 = 54 million records in a partition. About every 20 hours, you can remove a partition.

One big advantage of partitioning is that the insert performance should work well and you don't have to delete individual records. There can be additional overheads depending on the query load, indexes, and other factors. But, with no additional indexes and a query load that is primarily inserts, it should solve your problem better than trying to delete 15,000 records each second along with the inserts.

like image 54
Gordon Linoff Avatar answered Sep 16 '22 17:09

Gordon Linoff