Say a, b, c are all <code>List<t></code> and I want to create an unsorted union of them. Although performance isn't super-critical, they might have 10,000 entries in each so I'm keen to avoid O(n^2) solutions. AFAICT the MSDN documentation doesn't say anything about the performance characteristics of union as far as the different types are concerned. My gut instinct says that if I just do <code>a.Union(b).Union(c)</code>, this will take O(n^2) time, but <code>new Hashset<t>(a).Union(b).Union(c)</code> would be O(n). Does anyone have any documentation or metrics to confirm or deny this assumption?

You should use <code>Enumerable.Union</code> because it is as efficient as the <code>HashSet</code> approach. Complexity is O(n+m) because: <code>Enumerable.Union</code> <blockquote> When the object returned by this method is enumerated, <code>Union<TSource></code> enumerates first and second in that order and yields each element that has not already been yielded. </blockquote> Source-code here. <hr> Ivan is right, there is an overhead if you use <code>Enumerable.Union</code> with multiple collections since a new set must be created for every chained call. So it might be more efficient(in terms of memory consumption) if you use one of these approaches: <ol> <li> <code>Concat</code> + <code>Distinct</code>: <pre class="prettyprint"><code>a.Concat(b).Concat(c)...Concat(x).Distinct() </code></pre> </li> <li> <code>Union</code> + <code>Concat</code> <pre class="prettyprint"><code>a.Union(b.Concat(c)...Concat(x)) </code></pre> </li> <li> <code>HashSet<T></code> constructor that takes <code>IEnumerable<T></code>(f.e. with <code>int</code>): <pre class="prettyprint"><code>new HashSet<int>(a.Concat(b).Concat(c)...Concat(x)) </code></pre> </li> </ol> The difference between the first two might be negligible. The third approach is not using deferred execution, it creates a <code>HashSet<></code> in memory. It's a good and efficient way 1. if you need this collection type or 2. if this is the final operation on the query. But if you need to to further operations on this chained query you should prefer either <code>Concat + Distinct</code> or <code>Union + Concat</code>.

What is the simplest way to achieve O(n) performance when creating the union of 3 IEnumerables?

3 Answers

You should use Enumerable.Union because it is as efficient as the HashSet approach. Complexity is O(n+m) because:

Enumerable.Union

When the object returned by this method is enumerated, Union<TSource> enumerates first and second in that order and yields each element that has not already been yielded.

Source-code here.

Ivan is right, there is an overhead if you use Enumerable.Union with multiple collections since a new set must be created for every chained call. So it might be more efficient(in terms of memory consumption) if you use one of these approaches:

Concat + Distinct:

a.Concat(b).Concat(c)...Concat(x).Distinct()

Union + Concat
```
a.Union(b.Concat(c)...Concat(x))
```
HashSet<T> constructor that takes IEnumerable<T>(f.e. with int):
```
new HashSet<int>(a.Concat(b).Concat(c)...Concat(x))
```

The difference between the first two might be negligible. The third approach is not using deferred execution, it creates a HashSet<> in memory. It's a good and efficient way 1. if you need this collection type or 2. if this is the final operation on the query. But if you need to to further operations on this chained query you should prefer either Concat + Distinct or Union + Concat.

answered Oct 09 '22 02:10

Tim Schmelter

While @Tim Schmelter is right about linear time complexity of the Enumerable.Union method, chaining multiple Union operators has the hidden overhead that every Union operator internally creates a hash set which basically duplicates the one from the previous operator (plus additional items), thus using much more memory compared to single HashSet approach.

If we take into account the fact that Union is simply a shortcut for Concat + Distinct, the scalable LINQ solution with the same time/space complexity of the HashSet will be:

a.Concat(b).Concat(c)...Concat(x).Distinct()

answered Oct 09 '22 02:10

Ivan Stoev

Union is O(n).

a.Union(b).Union(c) is less efficient in most implementations than a.Union(b.Concat(c)) because it creates a hash-set for the first union operation and then another for the second, as other answers have said. Both of these also end up with a chain of IEnumerator<T> objects in use which increases cost as further sources are added.

a.Union(b).Union(c) is more efficient in .NET Core because the second .Union() operation produces a single object with knowledge of a, b and c and it will create a single hash-set for the entire operation, as well as avoiding the chain of IEnumerator<T> objects.

answered Oct 09 '22 02:10

Jon Hanna

Related questions
                            
                                Turning an SVG string into an image in a React component
                            
                                How to check Azure function is running on local environment? `RoleEnvironment` is not working in Azure Functions
                            
                                Is there a way to destructure a struct partially?
                            
                                gitignore all files in folders but keep folder structure
                            
                                Comparing floating point values converted from strings with literals
                            
                                Websockets vs Reactive sockets
                            
                                increase() in Prometheus sometimes doubles values: how to avoid?
                            
                                Can't bind to 'ngValue' since it isn't a known property of 'option'
                            
                                How to track which async tasks protractor is waiting on?
                            
                                scikit learn - feature importance calculation in decision trees
                            
                                Python parameter annotations unresolved reference
                            
                                How to find asp.net core version

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the simplest way to achieve O(n) performance when creating the union of 3 IEnumerables?

Tags:

c#

linq

Andy

People also ask

3 Answers

Tim Schmelter

Ivan Stoev

Jon Hanna

Recent Activity

Donate For Us