Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I combine two streams ordered then grouped by timestamp?

I have two streams of objects that each have a Timestamp value. Both streams are in order, so for example the timestamps might be Ta = 1,3,6,6,7 in one stream and Tb = 1,2,5,5,6,8 in the other. Objects in both streams are of the same type.

What I'd like to be able to do is to put each of these events on the bus in order of timestamp, i.e., put A1, then B1, B2, A3 and so on. Furthermore, since some streams have several (sequential) elements with the same timestamp, I want those elements grouped so that each new event is an array. So we would put [A3] on the bus, followed by [A15,A25] and so on.

I've tried to implement this by making two ConcurrentQueue structures, putting each event at the back of the queue, then looking at each front of the queue, choosing first the earlier event and then traversing the queue such that all events with this timestamp are present.

However, I've encountered two problems:

  • If I leave these queues unbounded, I quickly run out of memory as the read op is a lot faster than the handlers receiving the events. (I've got a few gigabytes of data).
  • I sometimes end up with a situation where I handle the event, say, A15 before A25 has arrived. I somehow need to guard against this.

I'm thinking that Rx can help in this regard but I don't see an obvious combinator(s) to make this possible. Thus, any advice is much appreciated.

like image 720
Dmitri Nesteruk Avatar asked Mar 16 '12 18:03

Dmitri Nesteruk


1 Answers

Rx is indeed a good fit for this problem IMO.

IObservables can't 'OrderBy' for obvious reasons (you would have to observe the entire stream first to guarantee the correct output order), so my answer below makes the assumption (that you stated) that your 2 source event streams are in order.

It was an interesting problem in the end. The standard Rx operators are missing a GroupByUntilChanged that would have solved this easily, as long as it called OnComplete on the previous group observable when the first element of the next group was observed. However looking at the implementation of DistinctUntilChanged it doesn't follow this pattern and only calls OnComplete when the source observable completes (even though it knows there will be no more elements after the first non-distinct element... weird???). Anyway, for those reasons, I decided against a GroupByUntilChanged method (to not break Rx conventions) and went instead for a ToEnumerableUntilChanged.

Disclaimer: This is my first Rx extension so would appreciate feedback on my choices made. Also, one main concern of mine is the anonymous observable holding the distinctElements list.

Firstly, your application code is quite simple:

    public class Event
    {
        public DateTime Timestamp { get; set; }
    }

    private IObservable<Event> eventStream1;
    private IObservable<Event> eventStream2; 

    public IObservable<IEnumerable<Event>> CombineAndGroup()
    {
        return eventStream1.CombineLatest(eventStream2, (e1, e2) => e1.Timestamp < e2.Timestamp ? e1 : e2)
            .ToEnumerableUntilChanged(e => e.Timestamp);
    }

Now for the ToEnumerableUntilChanged implementation (wall of code warning):

    public static IObservable<IEnumerable<TSource>> ToEnumerableUntilChanged<TSource,TKey>(this IObservable<TSource> source, Func<TSource,TKey> keySelector)
    {
        // TODO: Follow Rx conventions and create a superset overload that takes the IComparer as a parameter
        var comparer = EqualityComparer<TKey>.Default;

        return Observable.Create<IEnumerable<TSource>>(observer =>
        {
            var currentKey = default(TKey);
            var hasCurrentKey = false;
            var distinctElements = new List<TSource>();

            return source.Subscribe((value =>
            {
                TKey elementKey;
                try
                {
                    elementKey = keySelector(value);
                }
                catch (Exception ex)
                {
                    observer.OnError(ex);
                    return;
                }

                if (!hasCurrentKey)
                {
                    hasCurrentKey = true;
                    currentKey = elementKey;
                    distinctElements.Add(value);
                    return;
                }

                bool keysMatch;
                try
                {
                    keysMatch = comparer.Equals(currentKey, elementKey);
                }
                catch (Exception ex)
                {
                    observer.OnError(ex);
                    return;
                }

                if (keysMatch)
                {
                    distinctElements.Add(value);
                    return;
                }

                observer.OnNext( distinctElements);

                distinctElements.Clear();
                distinctElements.Add(value);
                currentKey = elementKey;

            }), observer.OnError, () =>
            {
                if (distinctElements.Count > 0)
                    observer.OnNext(distinctElements);

                observer.OnCompleted();
            });
        });
    }
like image 122
Tyson Avatar answered Sep 18 '22 23:09

Tyson