What is the fastest way to calculate frequency distribution for array in C#?

Tags:

I am just wondering what is the best approach for that calculation. Let's assume I have an input array of values and array of boundaries - I wanted to calculate/bucketize frequency distribution for each segment in boundaries array.

Is it good idea to use bucket search for that?

Actually I found that question Calculating frequency distribution of a collection with .Net/C#

But I do not understand how to use buckets for that purpose cause the size of each bucket can be different in my situation.

EDIT: After all discussions I have inner/outer loop solution, but still I want to eliminate the inner loop with a Dictionary to get O(n) performance in that case, if I understood correctly I need to hash input values into a bucket index. So we need some sort of hash function with O(1) complexity? Any ideas how to do it?

527

asked Aug 31 '11 15:08

Andrey

2 Answers

Bucket Sort is already O(n^2) worst case, so I would just do a simple inner/outer loop here. Since your bucket array is necessarily shorter than your input array, keep it on the inner loop. Since you're using custom bucket sizes, there are really no mathematical tricks that can eliminate that inner loop.

int[] freq = new int[buckets.length - 1];
foreach(int d in input)
{
    for(int i = 0; i < buckets.length - 1; i++)
    {
         if(d >= buckets[i] && d < buckets[i+1])
         {
             freq[i]++;
             break;
         }
    }
}

It's also O(n^2) worst case but you can't beat the code simplicity. I wouldn't worry about optimization until it becomes a real issue. If you have a larger bucket array, you could use a binary search of some sort. But, since frequency distributions are typically < 100 elements, I doubt you'd see a lot of real-world performance benefit.

answered Sep 19 '22 15:09

drharris

If your input array represents real world data (with its patterns) and array of boundaries is large to iterate it again and again in inner loop you can consider the following approach:

First of all sort your input array. If you work with real-world data I would recommend to consider Timsort - Wiki for this. It provides very good performance guarantees for a patterns that can be seen in real-world data.
Traverse through sorted array and compare it with the first value in the array of boundaries:
- If value in input array is less then boundary - increment frequency counter for this boundary
- If value in input array is bigger then boundary - go to the next value in array of boundaries and increment the counter for new boundary.

In a code it can look like this:

Timsort(myArray);
int boundPos; 
boundaries = GetBoundaries(); //assume the boundaries is a Dictionary<int,int>()

for (int i = 0; i<myArray.Lenght; i++) {
  if (myArray[i]<boundaries[boundPos]) { 
     boundaries[boubdPos]++;
  }
  else {
    boundPos++;
    boundaries[boubdPos]++;
  }
}

answered Sep 20 '22 15:09

Andrey Taptunov

Related questions
                            
                                Making a reusable predicate for EntitySet<T>, IQueryable<T> and IEnumerable<T>
                            
                                What is the C# equivalent of MsgWaitForMultipleObjects?
                            
                                C# 4.0: casting dynamic to static
                            
                                How to detect if Windows is directing traffic over LAN or over WiFi in C#
                            
                                How to use distinct with group by in Linq to SQL
                            
                                How to programmatically get the current audio level?
                            
                                What is the best full text search open source project (.NET preferred)?
                            
                                Get nETBIOSName from a UserPrincipal object
                            
                                C# POCO T4 template, generate interfaces?
                            
                                Help understanding C# optimization
                            
                                Determine the format of an image file?
                            
                                specifying fetch strategy (select, join, etc) in nhibernate queryover query
                            
                                Turn a 2D grid into a 'diamond' with LINQ - is it possible?
                            
                                Two-way folder sync with encryption to secure my Dropbox data
                            
                                Customizing OpenFileDialog
                            
                                Is this a bug in dynamic?
                            
                                How to wrap a static class in a non-static instance object (dynamically)
                            
                                Send mails using EXCHANGE SERVER (Microsoft Outlook web access)in asp.net
                            
                                SQL user defined aggregate order of values preserved?
                            
                                Questioning the use of DTOs with restful service and extracting behavior from update

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the fastest way to calculate frequency distribution for array in C#?

Tags:

c#

algorithm

design-patterns

data-structures

frequency-distribution

Andrey

People also ask

2 Answers

drharris

Andrey Taptunov

Recent Activity

Donate For Us