Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perform event-sourcing projections over table storage

I'm creating a tiny event-sourcing-style function app, where every invocation of a function will write an event to table storage. An example of such an event would be:

+------------+---------------+-----------------+
|   Event    |   Timestamp   |   Destination   |
+------------+---------------+-----------------+
| Connect    | 7/1/2019 4:52 | sftp.alex.com   |
| Disconnect | 7/1/2019 4:53 | sftp.liza.com   |
| Connect    | 7/1/2019 4:54 | sftp.yomama.com |
| Connect    | 7/1/2019 4:54 | sftp.alex.com   |
| Connect    | 7/1/2019 4:59 | sftp.liza.com   |
| Disconnect | 7/1/2019 4:59 | sftp.alex.com   |
| Disconnect | 7/1/2019 4:59 | sftp.yomama.com |
| Connect    | 7/1/2019 5:03 | sftp.alex.com   |
+------------+---------------+-----------------+

How do I create a projection over this table?

The main question that I would need to answer is:

How many connections does each destination currently have?

like image 784
Alex Gordon Avatar asked Jul 09 '19 17:07

Alex Gordon


2 Answers

I suppose there would be a lot of records in the table and iterating over all of them is not an option.
So here is a couple of ideas:

  1. Can't you just keep track of number of connections?

    That would be the easiest solution. I have no idea about your app and how it communicates with Azure, but at least there're triggers (although, judging by the supported bindings table, you will need to use some extra services... like for example Queue storage). And in them you should be able to store a current number of connections to each destination in a separate table, incrementing on Connect event and decrementing on Disconnect.

    But in case if you have a single writer (a single server that communicates with Azure) you can keep track of connections right inside your code.

    Also you can save the current number of connections to the table in an extra field. As a bonus, you'll be able to instantly get a number of connections at any given time in the past (at a memory cost).

  2. As you're talking about event-sourcing... then maybe you should use it once more? Idea is still the same: you keep track of Connect and Disconnect events but in some external receiver. As your're writing event-sourcing-style function app I believe it should be easy to create one. And you won't have to depend on extra Azure services.

    Then the only difference from the first idea is that if the receiver dies or disconnects or something - just remember the last events it received and, when the receiver is back online, iterate only over the younger ones.

    This last received event that you should remember (plus the counters) is essentially the snapshot others were talking in the comments.

like image 196
x00 Avatar answered Oct 10 '22 17:10

x00


Projections should be decoupled from the Event Stream because they are business driven while the Event Stream is purely a technical aspect.

I assume you are going to use SQL for persisting the projections to simplify the answer, but any Key/Value data store will do.

You can create a DestinationEvents table with the following structure:

+------------------+-----------------+-------------------+
|   Destination    |   Connections   |   Disconnections  |
+------------------+-----------------+-------------------+
| sftp.alex.com    |        3        |        1          |
| sftp.liza.com    |        1        |        1          |
+------------------+-----------------+-------------------+

With proper indexing this should give both fast reads and writes. For extra speed consider something like Redis to cache your projections.

The tricky bit is in solution design, you want it to scale. A naive approach may be to set up a SQL trigger for each write into the Event Stream, but this will slow you down if you have loads of writes.

If you want scalability, you need to start thinking about budget(time and money) and business requirements. Do the projections need to be available in real time?

  • if Not, you can have a scheduled process that computes/updates projections at a certain interval: daily, hourly, weekly, etc.
  • if Yes, you need to start looking into Queues/Message Brokers (RabbitMQ, Kafka, etc). Now you are entering in Producer/Consumer logic. Your App is the Producer, it publishes events. The EventStream and Projections storage are Consumers, they listen, transform and persist the incoming events. It is possible for the Queue/MessageBroker itself to replace your Event Stream table, it is easy with Kafka.

If you just want to learn, start with defining an in memory projection storage using a Dictionary<string, (int Connections, int Disconnections)> where Destination acts as Key and (int Connections, int Disconnections) is a tuple/class.

If you want to support other Projections, the in memory storage can be a Dictionary<string, Dictionary<string, (int Connections, int Disconnections)>> where the outer dictionary Key is the Projection name.

like image 37
Alexander Pope Avatar answered Oct 10 '22 18:10

Alexander Pope