Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

algorithm to find most realistic market price average in dataset

What I have:

  • users are selling foobars on an auction site.
  • each foobar is identical.
  • price of foobar determined by user.
  • i will be scrapping each price listing to form a data set that looks like:
    $prices = ('foobar' => [12.34, 15.22, 14.18, 20.55, 9.50]);

What I need:

  • to find a realistic average market price for each day, week, month.

Problems I face:

  • Outlier rejection implimentations are not proving to work very well because the data is biased.
  • It is extremely unlikely that a user will commit their auction to way below average market price becuase it can not be undone. Even if it is way below market price, this instance will happen so infrequently that the overall average will not be affected. However, users that will try to drive their prices up is much more likely and will happen frequently enough to affect the realistic average marketplace value.

What I think I'm going to do about it:

Daniel Collicott:

if I understand you correctly, you want to calculate the optimal selling value of an item. (or are you trying to calculate the real value??)

Sellers are quite naturally gaming (e.g. ebay), trying to maximize their profits.

For this reason, I'd would avoid average/SD approaches: they are too sensitive to outliers created by particular selling tactics.

Game-theory-wise, I think clever sellers would estimate the highest likely selling price (maximal profits) by researching their competitors and their historical sales output: to find the sweet spot.

For this reason I would record a histogram of historical prices over all sellers and look at the distribution of prices, using something approaching the mode to determine the optimal price i.e. the most common sale price. Better still, I would weigh prices by the profit (proportional to historical sales volume) of each individual seller.

I suspect this would be nearer to your optimal market value; if you are looking for the real market value then comment below or contact me at my machine learning firm

Questions I have:

  • A more detailed explanation for the things refered to in @Daniel Collicott's post:

    --> optimal selling value
    --> real selling value
    --> algorithms for both

like image 845
Dan Kanze Avatar asked Apr 29 '12 23:04

Dan Kanze


1 Answers

Your first problem pretty straightforward using the average and the standard deviation:

$prices = array
(
    'bar' => array(12.34, 102.55),
    'foo' => array(12.34, 15.66, 102.55, 134.66),
    'foobar' => array(12.34, 15.22, 14.18, 20.55, 99.50, 15.88, 16.99, 102.55),
);

foreach ($prices as $item => $bids)
{
    $average = call_user_func_array('Average', $bids);
    $standardDeviation = call_user_func_array('standardDeviation', $bids);

    foreach ($bids as $key => $bid)
    {
        if (($bid < ($average - $standardDeviation)) || ($bid > ($average + $standardDeviation)))
        {
            unset($bids[$key]);
        }
    }

    $prices[$item] = $bids;
}

print_r($prices);

Basically you just need to remove bids lower than avg - stDev or higher than avg + stDev.


And the actual functions (ported from my framework):

function Average()
{
    if (count($arguments = func_get_args()) > 0)
    {
        return array_sum($arguments) / count($arguments);
    }

    return 0;
}

function standardDeviation()
{
    if (count($arguments = func_get_args()) > 0)
    {
        $result = call_user_func_array('Average', $arguments);

        foreach ($arguments as $key => $value)
        {
            $arguments[$key] = pow($value - $result, 2);
        }

        return sqrt(call_user_func_array('Average', $arguments));
    }

    return 0;
}

Output (demo):

Array
(
    [bar] => Array
        (
            [0] => 12.34
            [1] => 102.55
        )

    [foo] => Array
        (
            [1] => 15.66
            [2] => 102.55
        )

    [foobar] => Array
        (
            [0] => 12.34
            [1] => 15.22
            [2] => 14.18
            [3] => 20.55
            [5] => 15.88
            [6] => 16.99
        )
)
like image 124
Alix Axel Avatar answered Sep 30 '22 00:09

Alix Axel