Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream data from MySQL Binary Log to Kinesis

We have a write-intensive table (on AWS RDS MySQL) from a legacy system and we'd like to stream every write event (insert or updated) from that table to kinesis. The idea is to create a pipe to warmup caches and update search engines.

Currently we do that using a rudimentar polling architecture, basically using SQL, but the ideal would be to have a push architecture reading the events directly from the transaction log.

Has anyone tried it? Any suggested architecture?

like image 451
David Lojudice Sb. Avatar asked Aug 17 '16 23:08

David Lojudice Sb.


People also ask

How do I post data to Kinesis stream?

To put data into the stream, you must specify the name of the stream, a partition key, and the data blob to be added to the stream. The partition key is used to determine which shard in the stream the data record is added to. All the data in the shard is sent to the same worker that is processing the shard.

Can Kinesis Data Stream store data?

Amazon Kinesis Data Streams is a serverless streaming data service that makes it easy to capture, process, and store data streams at any scale.


1 Answers

I've worked with some customers doing that already, in Oracle. Seems also that LinkedIn uses a lot that technique of streaming data from databases to somewhere else. They created a platform called Databus to accomplish that in an agnostic way - https://github.com/linkedin/databus/wiki/Databus-for-MySQL.

There is a public project in Github, following LinkedIn principles that is already streaming binlog from Mysql to Kinesis Streams - https://github.com/cmerrick/plainview

If you want to get into the nitty gritty details of LinkedIn approach, there is a really nice (and extensive) blog post available - https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying.

Last but not least, Yelp is doing that as well, but with Kafka - https://engineeringblog.yelp.com/2016/08/streaming-mysql-tables-in-real-time-to-kafka.html

Not getting into the basics of Kinesis Streams, for the sake of brevity, if we bring Kinesis Streams to the game, I don't see why it shouldn't work. As a matter of fact, it was built for that - your database transaction log is a stream of events. Borrowing an excerpt from Amazon Web Services public documentation: Amazon Kinesis Streams allows for real-time data processing. With Amazon Kinesis Streams, you can continuously collect data as it is generated and promptly react to critical information about your business and operations.

Hope this helps.

like image 150
Felipe Garcia Avatar answered Oct 13 '22 07:10

Felipe Garcia