Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do `join` and/or `JSON::to_json` convert my data silently from integer to string?

Tags:

perl

I don't understand why join changes the output of JSON::to_string in the following example:

#!/usr/bin/perl
use v5.26;
use Data::Dumper;
use JSON;

my @version = (1, 2, 3, 4);

say "version: ", join ".", @version;    # comment this line out

$Data::Dumper::Terse = 1;
$Data::Dumper::Indent = 0;

say Dumper(\@version);
say to_json(\@version);

The output with the line containing join:

version: 1.2.3.4
[1,2,3,4]
["1","2","3","4"]

But commenting out the line with join the output of to_json suddenly shows integers instead of strings although the output of Data::Dumper is still the same:

[1,2,3,4]
[1,2,3,4]
like image 835
A.H. Avatar asked Dec 14 '20 11:12

A.H.


2 Answers

When you stringify a number, the stringification is stored in the scalar along with the origin number. (You can see a demonstration at the bottom of my answer.)

When you numify a string, the numification is stored in the scalar along with the origin number.

This is an optimization since one often stringify or numify a scalar more than once.

This isn't a problem for Perl since Perl has coercing operators rather than polymorphic operators. But it puts the authors of JSON serializers in the difficult positions of either requiring additional information or guessing which of the values a scalar contains should be used.

You can force a number using $x = 0 + $x;.

You can force a string using $x = "$x";.

More detailed answer follows.


Perl is free to change internal format a scalar as it sees fit. This is usually done as part of modifying the scalar.

$x = 123;          # $x contains a signed integer
$x += 0.1;         # $x contains a float

$x = 2147483647;   # $x contains a signed integer
++$x;              # $x contains an unsigned integer (on a build with 32-bit ints)

$x = "123";        # $x contains a downgraded string
$x += 0;           # $x contains a signed integer

$x = "abc";        # $x contains a downgraded string
$x .= "\x{2660}";  # $x contains an upgraded string

But sometimes, Perl adds a second value to an scalar as an optimization.

$x = 123;          # $x contains a signed integer
$x * 0.1;          # $x contains a signed integer AND a float

$x = 123;          # $x contains a signed integer
"$x";              # $x contains a signed integer AND a downgraded string

$x = "123";        # $x contains a downgraded string
$x+0;              # $x contains a signed integer AND a downgraded string

These aren't the only double (or triple) vars you'll encounter.

my $x = !!0;        # $x contains a signed integer AND a float AND a downgraded string
"$!";               # $! contains a float (not a signed integer?!) AND a downgraded string

This isn't a problem in Perl because we use type-coercing operators (e.g. == works on numbers, eq works on strings). But many other languages rely on polymorphic operators (e.g. == can be used to compare strings and to compare numbers).[1]

But it does present a problem for JSON serializers which are forced to assign a single type to a scalar. If $x contains both a string a number, which one should be used?

If the scalar is the result of stringification, using the number would be ideal, but if the scalar is the result of numification, the string would be ideal. There's no way to tell which of these origins pertains to a scalar (if any), so the module's author was left with a tough choice.

Ideally, they would have provided a different interface, but that could have added complexity and a performance penalty.


You can view the internals of a scalar using Devel::Peek's Dump. The relevant line is the FLAGS line.

  • IOK without IsUV: contains a signed integer
  • IOK with IsUV: contains an unsigned integer
  • NOK: contains a float
  • POK without UTF8: contains a downgraded string
  • POK with UTF8: contains an upgraded string
  • ROK: contains a reference
$ perl -MDevel::Peek -e'$x=123; Dump($x); "$x"; Dump($x);' 2>&1 |
   perl -M5.014 -ne'next if !/FLAGS/; say join ",", /\b([INPR]OK|IsUV|UTF8)/g'
IOK
IOK,POK

$ perl -MDevel::Peek -e'$x="123"; Dump($x); 0+$x; Dump($x);' 2>&1 |
   perl -M5.014 -ne'next if !/FLAGS/; say join ",", /\b([INPR]OK|IsUV|UTF8)/g'
POK
IOK,POK

  1. Well, Perl doesn't have separate operators for the different numeric types, which can cause issues (e.g. -0 exists a float, but not as an int), but these problems are seldom encountered.

    Another issue is that the stringification of floats often results in a loss of information.

like image 153
ikegami Avatar answered Oct 17 '22 15:10

ikegami


This is one of the very few times where you must maintain data purity in Perl. Once you create a variable of some type, you must never use it in a context of any other type. If you do need to, copy it to a new variable first to preserve the original.

use feature 'say';
use Data::Dumper;
use JSON;

my @version = (1, 2, 3, 4);

{ say "version: ", join ".", my @copy = @version; }

$Data::Dumper::Terse = 1;
$Data::Dumper::Indent = 0;

say Dumper(\@version);
say to_json(\@version);

Prints:

version: 1.2.3.4
[1,2,3,4]
[1,2,3,4]

I would also recommend using Cpanel::JSON::XS because this is one area where pedantism is called for! It tries pretty hard to get the data types right. It also has some discussion of the conversion issue.

HTH

like image 3
lordadmira Avatar answered Oct 17 '22 16:10

lordadmira