First of all this is more of a math question than it is a coding one, so please be patient. I am trying to figure out an algorithm to calculate the mean for a set of numbers. However I need to neglect any numbers that are not close to the majority of the results. Here is an example of what I am trying to do: Lets say I have a set of numbers that are similar to the following: <pre class="prettyprint"><code>{ 90, 91, 92, 95, 2, 3, 99, 92, 92, 91, 300, 91, 92, 99, 400 } </code></pre> it is clear for the set above that the majority of numbers lies between <code>90</code> and <code>99</code>, however I have some outliers like <code>{ 300, 400, 2, 3 }</code>. I need to calculate the mean of those numbers while neglecting the outliers. I do remember reading about something like that in a statistics class but I cant remember what was it or how to approach the solution. Will appreciate any help.. Thanks

What you could do is: <ol> <li>estimate the percentage of outliers in your data: about 25% (4/15) of the provided dataset,</li> <li>compute the adequate quantiles: 8-quantiles for your dataset, so as to exclude the outliers,</li> <li>estimate the mean between the first and the last quantile.</li> </ol> PS: Outliers constituting 25% of your dataset is a lot! PPS: For the second step, we assumed outliers are "symmetrically distributed". See the graph below, where we use 4-quantiles and 1.5 times the interquartile range (IQR) from Q1 and Q3:<img src="https://i.stack.imgur.com/7nWRH.png" alt="enter image description here">

Calculating the mean for a set of numbers while neglecting outliers

Tags:

c++

math

First of all this is more of a math question than it is a coding one, so please be patient. I am trying to figure out an algorithm to calculate the mean for a set of numbers. However I need to neglect any numbers that are not close to the majority of the results. Here is an example of what I am trying to do:

Lets say I have a set of numbers that are similar to the following:

{ 90, 91, 92, 95, 2, 3, 99, 92, 92, 91, 300, 91, 92, 99, 400 }

it is clear for the set above that the majority of numbers lies between 90 and 99, however I have some outliers like { 300, 400, 2, 3 }. I need to calculate the mean of those numbers while neglecting the outliers. I do remember reading about something like that in a statistics class but I cant remember what was it or how to approach the solution.

Will appreciate any help..

Thanks

634

asked Jun 01 '11 11:06

Zaid Amir

1 Answers

What you could do is:

estimate the percentage of outliers in your data: about 25% (4/15) of the provided dataset,
compute the adequate quantiles: 8-quantiles for your dataset, so as to exclude the outliers,
estimate the mean between the first and the last quantile.

PS: Outliers constituting 25% of your dataset is a lot!

PPS: For the second step, we assumed outliers are "symmetrically distributed". See the graph below, where we use 4-quantiles and 1.5 times the interquartile range (IQR) from Q1 and Q3: enter image description here

answered Sep 29 '22 22:09

Wok

Related questions
                            
                                No console output on cout
                            
                                Would this optimization in the implementation of std::string be allowed?
                            
                                Writing C# GUI over a C++ dll or C++ exe
                            
                                What's with these g++ "multiple definition" errors?
                            
                                What is the proper way to overload operators in abstract base classes?
                            
                                Simultaneous abort() in two threads
                            
                                Find minimum of vector in Rcpp
                            
                                Is there an open source thread safe C++ object pool implementation? [closed]
                            
                                Are boost multi_index extracted keys cached?
                            
                                C++: Making my project support unicode
                            
                                How does delete work? [duplicate]
                            
                                C++: Queue with efficient get/put of multiple elements?
                            
                                /Za compiler directive does not compile system headers in VS2010
                            
                                Forcing QGraphicsItem To Stay Put [duplicate]
                            
                                new and delete in a c++ library being called from a C program
                            
                                How to save the client area of a child Window to a Bitmap file?
                            
                                C++: Boost: how do I check the existence of a folder inside another folder in my working directory?
                            
                                Rules of thumb for putting functions in header files
                            
                                Is it possible to ensure copy elision?
                            
                                C++ Overloading a Function Based on shared_ptr Derived Class

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With