Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Price Filter Grouping Algorithm

I am creating an ecommerce site, and I am having trouble developing a good algorithm to sort a products that are pulled from the database into halfway appropriate groups. I have tried simply dividing the highest price into 4, and basing each group off that. I also tried standard deviations based around the mean. Both could result with price ranges that no product would fall into, which isn't a useful filtering option.

I also tried take quartiles of the products, but my problem is that the price ranges from $1 items to $4,000. The $4,000 almost never sell, and are far less important, but they keep skewing my results.

Any thoughts? I should have paid more attention in stats class ...

Update:

I ended up combining methods a bit. I used the quartile/bucket method, but hacked it a bit by hardcoding certain ranges within which a greater number of price groups would appear.

//Price range algorithm

sort($prices);

//Divide the number of prices into four groups
$quartilelength = count($prices)/4;

//Round to the nearest ...
$simplifier = 10;

//Get the total range of the prices
$range = max($prices)-min($prices);

//Assuming we actually are working with multiple prices
if ($range>0 )
{
    // If there is a decent spread in price, and there are a decent number of prices, give more price groups
    if ($range>20 && count($prices) > 10) 
    {
        $priceranges[0] = floor($prices[floor($quartilelength)]/$simplifier)*$simplifier;
    }

    // Always grab the median price
    $priceranges[1] = floor($prices[floor($quartilelength*2)]/$simplifier)*$simplifier;

    // If there is a decent spread in price, and there are a decent number of prices, give more price groups
    if ($range>20 && count($this->data->prices) > 10)
    {
        $priceranges[2] = floor($prices[floor($quartilelength*3)]/$simplifier)*$simplifier;
    }
}
like image 686
Dave W. Avatar asked Jul 21 '10 22:07

Dave W.


2 Answers

Here is an idea: basically you would sort the price into buckets of 10, each price as the key in the array, the value is a count of how many products are at the given price point:

public function priceBuckets($prices)
{    
    sort($prices);

    $buckets = array(array());
    $a = 0;

    $c = count($prices);
    for($i = 0; $i !== $c; ++$i) {
        if(count($buckets[$a]) === 10) {
            ++$a;
            $buckets[$a] = array();
        }

        if(isset($buckets[$a][$prices[$i]])) {
            ++$buckets[$a][$prices[$i]];
        } else if(isset($buckets[$a - 1][$prices[$i]])) {
            ++$buckets[$a - 1][$prices[$i]];
        } else {
            $buckets[$a][$prices[$i]] = 1;
        }
    }

    return $buckets;
}

//TEST CODE
$prices = array();

for($i = 0; $i !== 50; ++$i) {
    $prices[] = rand(1, 100);
}
var_dump(priceBuckets($prices));

From the result, you can use reset and end to get the min/max of each bucket

Kinda brute force, but might be useful...

like image 89
tsgrasser Avatar answered Oct 21 '22 07:10

tsgrasser


Here is an idea, following the line of thought of my comment:

I assume you have a set of products, each of them tagged by a price and a sales volume estimate (as a percent from the total sales). First, sort all products by their price. Next, start splitting: traverse the ordered list, and accumulate sales volume. Each time you reach about 25%, cut there. If you do so 3 times, it will result in 4 subsets having disjoint price ranges, and a similar sales volume.

like image 27
Eyal Schneider Avatar answered Oct 21 '22 07:10

Eyal Schneider