Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

array_unique with SORT_NUMBERIC behaviour

I've stumbled upon something weird and I don't understand why it works that way.

I have an array of numbers, they are all unique:

$array = [
    98602142989816970,
    98602142989816971,
    98602142989816980,
    98602142989816981,
    98602142989816982,
    98602142989816983,
    98602142989820095,
    98602142989820096,
    98602142989822060,
    98602142989822061,
];
var_dump($array);
array(10) {
  [0]=>
  int(98602142989816970)
  [1]=>
  int(98602142989816971)
  [2]=>
  int(98602142989816980)
  [3]=>
  int(98602142989816981)
  [4]=>
  int(98602142989816982)
  [5]=>
  int(98602142989816983)
  [6]=>
  int(98602142989820095)
  [7]=>
  int(98602142989820096)
  [8]=>
  int(98602142989822060)
  [9]=>
  int(98602142989822061)
}

If I do print_r(array_unique($array)); everything is fine, I get:

Array
(
    [0] => 98602142989816970
    [1] => 98602142989816971
    [2] => 98602142989816980
    [3] => 98602142989816981
    [4] => 98602142989816982
    [5] => 98602142989816983
    [6] => 98602142989820095
    [7] => 98602142989820096
    [8] => 98602142989822060
    [9] => 98602142989822061
)

But If I add SORT_NUMERIC flag print_r(array_unique($array, SORT_NUMERIC)); I get:

Array
(
    [0] => 98602142989816970
    [6] => 98602142989820095
    [8] => 98602142989822060
)

Why only those 3 numbers are returned?

update: I'm on 64-bit system.

For sort functions I've manually shuffled some of the values because in original array they are already sorted.

If I do sort($array); then response is as expected:

Array
(
    [0] => 98602142989816970
    [1] => 98602142989816971
    [2] => 98602142989816980
    [3] => 98602142989816981
    [4] => 98602142989816982
    [5] => 98602142989816983
    [6] => 98602142989820095
    [7] => 98602142989820096
    [8] => 98602142989822060
    [9] => 98602142989822061
)

But with sort($array, SORT_NUMERIC);, they are sorted incorrectly:

Array
(
    [0] => 98602142989816970
    [1] => 98602142989816982
    [2] => 98602142989816983
    [3] => 98602142989816980
    [4] => 98602142989816981
    [5] => 98602142989816971
    [6] => 98602142989820095
    [7] => 98602142989820096
    [8] => 98602142989822060
    [9] => 98602142989822061
)
like image 382
Arthur Shveida Avatar asked Feb 20 '20 10:02

Arthur Shveida


1 Answers

You're running into an issue with precision and floating point arithmetic at that scale. There's a load more information available at Is floating point math broken? if you're interested, but I don't think this quite counts as a duplicate of that.

Taking your first two numbers:

php > var_dump((float) 98602142989816970 === (float) 98602142989816971);
bool(true)

php > var_dump((float) 98602142989816970, (float) 98602142989816971);
float(9.8602142989817E+16)
float(9.8602142989817E+16)

Internally, this is what's happening when PHP compares the values in your array using SORT_NUMERIC, deep down in numeric_compare_function.

sort suffers from the same issue, see https://3v4l.org/02UUB (Obviously no values are removed from the array since that only happens in array_unique - they just aren't sorted properly)

In short, with numbers this size (or specifically numbers that are very close together relative to their scale), SORT_NUMERIC isn't going to be reliable. Stick with comparing them as strings if you can.

like image 60
iainn Avatar answered Sep 19 '22 17:09

iainn