Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to normalize Perl function arguments for memoization?

Tags:

perl

How can I normalize a list of function arguments to a string, such that two argument lists convert to the same string iff they are effectively equivalent? The algorithm should

  1. Compare embedded hashes and lists deeply, rather than by reference
  2. Ignore hash key order
  3. Ignore difference between 3 and "3"
  4. Generate a relatively readable string (not required, but nice-to-have for debugging)
  5. Perform well (XS preferred over Perl)

This is necessary for memoization, i.e. caching the result of the function based on its arguments.

As a strawman example, Memoize uses this as a default normalizer, which fails #1 and #3:

$argstr = join chr(28),@_;  

For a while my go-to normalizer was

JSON::XS->new->utf8->canonical

However it treats the number 3 and the string "3" differently, based on how the scalar was used recently. This can generate different strings for essentially equivalent argument lists and reduce the memoization benefit. (The vast majority of functions won't know or care if they get 3 or "3".)

For fun I looked at a bunch of serializers to see which ones differentiate 3 and "3":

Data::Dump   : equal - [3] vs [3]
Data::Dumper : not equal - [3] vs ['3']
FreezeThaw   : equal - FrT;@1|@1|$1|3 vs FrT;@1|@1|$1|3
JSON::PP     : not equal - [3] vs ["3"]
JSON::XS     : not equal - [3] vs ["3"]
Storable     : not equal - <unprintable>
YAML         : equal - ---\n- 3\n vs ---\n- 3\n
YAML::Syck   : equal - --- \n- 3\n vs --- \n- 3\n
YAML::XS     : not equal - ---\n- 3\n vs ---\n- '3'\n

Of the ones that report "equal", not sure how to get them to ignore hash key order.

I could walk the argument list beforehand and stringify all numbers, but this would require making a deep copy and would violate #5.

Thanks!

like image 621
Jonathan Swartz Avatar asked May 31 '12 13:05

Jonathan Swartz


1 Answers

Pretty much any serializer will treat 3 and "3" differently, because it doesn't have knowledge that number and stringified number are same for you and this assumption is false for general data. You must normalize either input or output yourself.

For input, deep scan with replacing any stringified number with its value+0 will do. If you know where exactly numbers may be in input, you can shorten this scan considerably.

For output, some simple state machine or even regexp (yes, I know that output is not regular) will be most likely enough to strip number-only string values to numbers.

like image 78
Oleg V. Volkov Avatar answered Nov 06 '22 16:11

Oleg V. Volkov