Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In PHP how do I remove duplicates in an array of objects where a duplicate is defined as a subset of key-value pairs having the same value [duplicate]

I have an array of the form:

class anim {
    public $qs;
    public $dp;
    public $cg;
    public $timestamp;
}
$animArray = array();

$myAnim = new anim();
$myAnim->qs = "fred";
$myAnim->dp = "shorts";
$myAnim->cg = "dino";
$myAnim->timestamp = 1590157029399;
$animArray[] = $myAnim;

$myAnim = new anim();
$myAnim->qs = "barney";
$myAnim->dp = "tshirt";
$myAnim->cg = "bird";
$myAnim->timestamp = 1590133656330;
$animArray[] = $myAnim;

$myAnim = new anim();
$myAnim->qs = "fred";
$myAnim->dp = "tshirt";
$myAnim->cg = "bird";
$myAnim->timestamp = 1590117032286;
$animArray[] = $myAnim;

How do I create a new array containing only the non-duplicates (and the latest entry where duplicates are found) of $animArray, where a duplicate is defined as:

one where $myAnim->dp has the same value as that of another array element's $myAnim->dp AND the $myAnim->cg from the first and the $myAnim->cg from the second have the same value as each other.

In the example above, only the first element is unique by that definition.

I'm hoping there's an elegant solution. I've been through all the array functions in the PHP manual but can't see how it could be achieved.

I could loop through each array element checking if $myAnim->dp has the same value as that of another array element's $myAnim->dp, saving the matches into a new array and then looping through that new array, checking for its $myAnim->cg matching the $myAnim->cg of any other element in that new array.

A more elegant solution would allow me to to change which combination of key-value pairs determine whether there's a duplicate, without having to recast much code.

Does such a solution exist?

Thanks for helping this novice :)

like image 342
Mark Highton Ridley Avatar asked Sep 11 '25 17:09

Mark Highton Ridley


2 Answers

While there is nothing built-in that can be used directly out of the box, there isn't a lot of code necessary to handle an arbitrary number of properties to consider for uniqueness. By keeping track of each unique property in a lookup array, we can build an array where the leaf nodes (i.e. the ones that isn't arrays themselves) are the objects.

We do this by keeping a reference (&) to the current level in the array, then continue building our lookup array for each property.

function find_uniques($list, $properties) {
    $lookup = [];
    $unique = [];
    $last_idx = count($properties) - 1;

    // Build our lookup array - the leaf nodes will be the items themselves,
    // located on a level that matches the number of properties to look at
    // to consider a duplicate
    foreach ($list as $item) {
        $current = &$lookup;

        foreach ($properties as $idx => $property) {
            // last level, keep object for future reference
            if ($idx == $last_idx) {
                $current[$item->$property] = $item;
                break;
            } else if (!isset($current[$item->$property])) {
                // otherwise, if not already set, create empty array
                $current[$item->$property] = [];
            }

            // next iteration starts on this level as its current level
            $current = &$current[$item->$property];
        }
    }

    // awr only calls the callback for leaf nodes - i.e. our items.
    array_walk_recursive($lookup, function ($item) use (&$unique) {
        $unique[] = $item;
    });

    return $unique;
}

Called with your data above, and the requirement being that uniques and the last element of duplicates being returned, we get the following result:

var_dump(find_uniques($animArray, ['dp', 'cg']));

array(2) {
  [0] =>
  class anim#1 (4) {
    public $qs =>
    string(4) "fred"
    public $dp =>
    string(6) "shorts"
    public $cg =>
    string(4) "dino"
    public $timestamp =>
    int(1590157029399)
  }
  [1] =>
  class anim#3 (4) {
    public $qs =>
    string(4) "fred"
    public $dp =>
    string(6) "tshirt"
    public $cg =>
    string(4) "bird"
    public $timestamp =>
    int(1590117032286)
  }
}

Which maps to element [0] and element [2] in your example. If you instead want to keep the first object for duplicates, add an isset that terminates the inner loop if property value has been seen already:

foreach ($properties as $idx => $property) {
    if ($idx == $last_idx) {
        if (isset($current[$item->$property])) {
            break;
        }

        $current[$item->$property] = $item;
    } else {
        $current[$item->$property] = [];
    }

    // next iteration starts on this level as its current level
    $current = &$current[$item->$property];
}

It's important to note that this has been written with the assumption that the array you want to check for uniqueness doesn't contain arrays themselves (since we're looking up properties with -> and since we're using array_walk_recursive to find anything that isn't an array).

like image 124
MatsLindh Avatar answered Sep 14 '25 05:09

MatsLindh


This was fun:

array_multisort(array_column($animArray, 'timestamp'), SORT_DESC, $animArray);

$result = array_intersect_key($animArray,
          array_unique(array_map(function($v) { return $v->dp.'-'.$v->cg; }, $animArray)));
  • First, extract the timestamp and sort that array descending, thereby sorting the original array.
  • Then, map to create a new array using the dp and cg combinations.
  • Next, make the combination array unique which will keep the first duplicate encountered (that's why we sorted descending).
  • Finally, get the intersection of keys of the original array and the unique one.

In a function with dynamic properties:

function array_unique_custom($array, $props) {

    array_multisort(array_column($array, 'timestamp'), SORT_DESC, $array);

    $result = array_intersect_key($array,
              array_unique(array_map(function($v) use ($props) {
                  return implode('-', array_map(function($p) use($v) { return $v->$p; }, $props));;
              },
              $array)));

    return $result;
}
$result = array_unique_custom($animArray, ['dp', 'cg']);

Another option would be to sort it ascending and then build an array with a dp and cg combination as the key, which will keep the last duplicate:

array_multisort(array_column($animArray, 'timestamp'), SORT_ASC, $animArray);

foreach($animArray as $v) {
    $result[$v->dp.'-'.$v->cg] = $v;
}

In a function with dynamic properties:

function array_unique_custom($array, $props) {

    array_multisort(array_column($array, 'timestamp'), SORT_ASC, $array);

    foreach($array as $v) {
        $key = implode(array_map(function($p) use($v) { return $v->$p; }, $props));
        $result[$key] = $v;
    }
    return $result;
}
$result = array_unique_custom($animArray, ['dp', 'cg']);
like image 25
AbraCadaver Avatar answered Sep 14 '25 06:09

AbraCadaver