Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arrays values not identical (but they are?)

Tags:

arrays

php

I have two arrays. They seem to contain at least one identical set of values, but performing array_diff() does not return anything even though I think it should! This should have been just routine code but for some reason it's not liking what I've done.

The weird thing is that var_dump($queue[0]); returns String(167); and var_dump($videos[0]) returns String(168).

So clearly, they must be different right?

echo similar_text($queue[0]), $videos[0]); returns 167. What!?

Note: These are just file names and do not represent the contents of the file.

Videos Array

Array ( [0] => /var/www/downloads/j2/Dexter Season 1, 2, 3, 4, 5 & 6 + Extras (Early Cuts, Audiobooks etc) DVDRip HDTV TSV/Season 3/Dexter Season 3 Episode 04 - All in the Family.avi )

Queue Array

Array ( [0] => /var/www/downloads/j2/Dexter Season 1, 2, 3, 4, 5 & 6 + Extras (Early Cuts, Audiobooks etc) DVDRip HDTV TSV/Season 3/Dexter Season 3 Episode 04 - All in the Family.avi [1] => j2 )

Outputs

$diff = array_intersect($queue,$videos); print_r($diff); returns Array ( )

var_dump($queue[0]); returns string(167) "/var/www/downloads/j2/Dexter Season 1, 2, 3, 4, 5 & 6 + Extras (Early Cuts, Audiobooks etc) DVDRip HDTV TSV/Season 3/Dexter Season 3 Episode 04 - All in the Family.avi"

var_dump($videos[0]); returns string(168) "/var/www/downloads/j2/Dexter Season 1, 2, 3, 4, 5 & 6 + Extras (Early Cuts, Audiobooks etc) DVDRip HDTV TSV/Season 3/Dexter Season 3 Episode 04 - All in the Family.avi"

echo similar_text($queue[0], $videos[0]); returns 167.

I've put the strings into JavaScript character counts, I've used strlen(), trim() to trim whitespace, I've even manually counted each character individually. What's going on?

like image 633
Jimbo Avatar asked Sep 20 '12 13:09

Jimbo


1 Answers

After converting both strings to hex-escaped form using

var_dump(preg_replace_callback('#.#', function($m) {
  return '\\x' . dechex(ord($m[0]));
}, $input))

, the result strings appear like this: http://jsfiddle.net/mgaWn/

Looking at them in that form shows that the first string contains 5,·6·+·Extras, the second one contains 5,·6··+·Extras - there's a double space before the + sign.

HTML collapses whitespace and this difference becomes completely invisible. It is generally a good idea to compare the data as close to its original format as possible, before any output format specifics (such as character encodings or this HTML whitespace minimization) get in your way.

like image 143
DCoder Avatar answered Sep 28 '22 21:09

DCoder