Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash keys encoding: Why do I get here with Devel::Peek::Dump two different results?

Why do I get here with Devel::Peek::Dump two different results?

#!/usr/bin/env perl
use warnings;
use 5.014;
use utf8;
binmode STDOUT, ':encoding(utf-8)';
use Devel::Peek;

my %hash1 = ( 'müller' => 1 );
say Dump $_ for keys %hash1;

my %hash2;
$hash2{'müller'} = 1;
say Dump $_ for keys %hash2;

Output:

SV = PV(0x753270) at 0x76d230
  REFCNT = 2
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x759750 "m\303\274ller"\0 [UTF8 "m\x{fc}ller"]
  CUR = 7
  LEN = 8

SV = PV(0x753270) at 0x7d75a8
  REFCNT = 2
  FLAGS = (POK,FAKE,READONLY,pPOK)
  PV = 0x799110 "m\374ller"
  CUR = 6
  LEN = 0
like image 300
sid_com Avatar asked Dec 07 '11 16:12

sid_com


2 Answers

Both of those scalars contain exactly the same string. The only difference is only in how the string is stored internally.

My guess is that the key is normalised to make comparisons easier when trying to locate the key in the hash.

like image 174
ikegami Avatar answered Sep 24 '22 00:09

ikegami


This is not an answer, I believe ikegami response is correct. I just wanted to add some observations with some code.

I ran the following code through 5.10 to 5.15 and the behavior is consistent.

use utf8;
use Test::More;

{
    my %h = ('müller' => 1);
    my $k = (keys %h)[0];
    ok(utf8::is_utf8($k), 'UTF-8 Latin-1 hash key has SvUTF8 set');
}

{
    my %h = ('müller' => 1);
       $h{'müller'} = 2;
    my $k = (keys %h)[0];
    ok( ! utf8::is_utf8($k), 'UTF-8 Latin-1 hash key does not has SvUTF8 set after assignment');
}

{
    my %h = ('☺' => 1);
       $h{'☺'} = 2;
    my $k = (keys %h)[0];
    ok(utf8::is_utf8($k), 'UTF-8 (> Latin-1) hash key has SvUTF8 set after assignment');
}

done_testing;

If the second test is expected, it would be the first silent downgrade I'm aware of. I guess p5p has the final answer whether or not this is a optimization bug or expected behavior. (sv_dump looks like a optimization (POK,FAKE,READONLY,pPOK))

like image 1
chansen Avatar answered Sep 23 '22 00:09

chansen