I have an observable data stream that I am applying operations to, splitting into two separate streams, applying more (distinct) operations to each of the two streams, and merging together again. I am trying to share the observable between two subscribers using Publish
and Connect
but each of the subscribers seems to be using a separate stream. That is, in the example below, I see "Doing an expensive operation" printed once for each item in the stream for both of the subscribers. (Imagine the expensive operation as being something that should happen only once between all subscribers, as such I am trying to reuse the stream.) I have used Publish
and Connect
to try and share the merged observable with both subscribers, but it seems to have the wrong effect.
Example with the issue:
var foregroundScheduler = new NewThreadScheduler(ts => new Thread(ts) { IsBackground = false });
var timer = Observable.Timer(TimeSpan.Zero, TimeSpan.FromSeconds(10), foregroundScheduler);
var expensive = timer.Select(i =>
{
// Converting to strings is an expensive operation
Console.WriteLine("Doing an expensive operation");
return string.Format("#{0}", i);
});
var a = expensive.Where(s => int.Parse(s.Substring(1)) % 2 == 0).Select(s => new { Source = "A", Value = s });
var b = expensive.Where(s => int.Parse(s.Substring(1)) % 2 != 0).Select(s => new { Source = "B", Value = s });
var connectable = Observable.Merge(a, b).Publish();
connectable.Where(x => x.Source.Equals("A")).Subscribe(s => Console.WriteLine("Subscriber A got: {0}", s));
connectable.Where(x => x.Source.Equals("B")).Subscribe(s => Console.WriteLine("Subscriber B got: {0}", s));
connectable.Connect();
I see the following output:
Doing expensive operation
Doing expensive operation
Subscriber A got: { Source = A, Value = #0 }
Doing expensive operation
Doing expensive operation
Subscriber B got: { Source = B, Value = #1 }
(Output continues, truncated for brevity.)
How can I share the observable with both subscribers?
You have published the wrong observable.
With the current code you are merging and then publishing like this Observable.Merge(a, b).Publish();
. Now since a
& b
are defined against expensive
you still get two subscriptions to expensive
.
The subscriptions create these pipelines:
You can see this if you take out the .Publish();
from your code. The output becomes:
Doing an expensive operation
Doing an expensive operation
Doing an expensive operation
Doing an expensive operation
Subscriber A got: { Source = A, Value = #0 }
Doing an expensive operation
Doing an expensive operation
Doing an expensive operation
Doing an expensive operation
Subscriber B got: { Source = B, Value = #1 }
This creates these pipelines:
So, by shifting the .Publish()
back up to expensive
you eliminate the problem. That's where you really needed it because it is the expensive operation after all.
This is the code you needed:
var foregroundScheduler = new NewThreadScheduler(ts => new Thread(ts) { IsBackground = false });
var timer = Observable.Timer(TimeSpan.Zero, TimeSpan.FromSeconds(10), foregroundScheduler);
var expensive = timer.Select(i =>
{
// Converting to strings is an expensive operation
Console.WriteLine("Doing an expensive operation");
return string.Format("#{0}", i);
});
var connectable = expensive.Publish();
var a = connectable.Where(s => int.Parse(s.Substring(1)) % 2 == 0).Select(s => new { Source = "A", Value = s });
var b = connectable.Where(s => int.Parse(s.Substring(1)) % 2 != 0).Select(s => new { Source = "B", Value = s });
var merged = Observable.Merge(a, b);
merged.Where(x => x.Source.Equals("A")).Subscribe(s => Console.WriteLine("Subscriber A got: {0}", s));
merged.Where(x => x.Source.Equals("B")).Subscribe(s => Console.WriteLine("Subscriber B got: {0}", s));
connectable.Connect();
That nicely produces the following:
Doing an expensive operation
Subscriber A got: { Source = A, Value = #0 }
Doing an expensive operation
Subscriber B got: { Source = B, Value = #1 }
Doing an expensive operation
Subscriber A got: { Source = A, Value = #2 }
Doing an expensive operation
Subscriber B got: { Source = B, Value = #3 }
And this gives you these pipelines:
You can see from this image that there is still duplication. That's fine because these parts aren't expensive.
The duplication is actually important. Shared parts of the pipelines make their endpoints vulnerable to errors and thus to early termination. The less sharing the better for the robustness of the code. It's only when you have an expensive operation that you should worry about publishing. Otherwise you should just let the pipelines be themselves.
Here's an example to show it. If you don't have a published source then, if one source produces an error then it doesn't pull down all of the pipelines.
But once you introduce a shared observable then a single error will bring down all of the pipelines.
One possible fix:
var foregroundScheduler = new NewThreadScheduler(ts => new Thread(ts) { IsBackground = false });
var timer = Observable.Timer(TimeSpan.Zero, TimeSpan.FromSeconds(10), foregroundScheduler);
var expensive = timer.Select(i =>
{
// Converting to strings is an expensive operation
Console.WriteLine("Doing an expensive operation");
return string.Format("#{0}", i);
});
var subj = new ReplaySubject<string>();
expensive.Subscribe(subj);
var a = subj.Where(s => int.Parse(s.Substring(1)) % 2 == 0).Select(s => new { Source = "A", Value = s });
var b = subj.Where(s => int.Parse(s.Substring(1)) % 2 != 0).Select(s => new { Source = "B", Value = s });
var merged = Observable.Merge(a, b);
merged.Where(x => x.Source.Equals("A")).Subscribe(s => Console.WriteLine("Subscriber A got: {0}", s));
merged.Where(x => x.Source.Equals("B")).Subscribe(s => Console.WriteLine("Subscriber B got: {0}", s));
The above example essentially creates a new intermediate observable that emits the results of the expensive operation. This allows you to subscribe to the results of the expensive operation, not to an expensive transformation applied to a timer.
With this you'll see:
Doing an expensive operation
Subscriber A got: { Source = A, Value = #0 }
Doing an expensive operation
Subscriber B got: { Source = B, Value = #1 }
(Output continues, truncated for brevity.)
Alternatively, you could move the calls to Publish
and Connect
:
var foregroundScheduler = new NewThreadScheduler(ts => new Thread(ts) {IsBackground = false});
var timer = Observable.Timer(TimeSpan.Zero, TimeSpan.FromSeconds(10), foregroundScheduler);
var expensive = timer.Select(i =>
{
// Converting to strings is an expensive operation
Console.WriteLine("Doing an expensive operation");
return string.Format("#{0}", i);
}).Publish();
var a = expensive.Where(s => int.Parse(s.Substring(1)) % 2 == 0).Select(s => new { Source = "A", Value = s });
var b = expensive.Where(s => int.Parse(s.Substring(1)) % 2 != 0).Select(s => new { Source = "B", Value = s });
var merged = Observable.Merge(a, b);
merged.Where(x => x.Source.Equals("A")).Subscribe(s => Console.WriteLine("Subscriber A got: {0}", s));
merged.Where(x => x.Source.Equals("B")).Subscribe(s => Console.WriteLine("Subscriber B got: {0}", s));
expensive.Connect();
ReplaySubject
, not just Subject
or some other subject?A Subject
, in the .NET Rx implementation is by default what the ReactiveX documentation calls a PublishSubject
, which emits to an observer only those items that are emitted by the source Observable subsequent to the time of the subscription. A ReplaySubject
on the other hand, emits to any observer all of the items that were emitted by the source Observable, regardless of when the observer subscribes. If we use a plain subject in the first example, the subscription of subj
to the timer will cause subscriptions to subj
to miss anything emitted between the time that the subject subscribes to the expensive operation and the time that they subscribe to the intermediate subject (subj
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With