Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Smart PHP compression code

So I saw this smart topic on "Programming puzzles and code golf": We're not strangers.... The best answer is PHP code printing the lyrics of Never Gonna Give You Up. It is only 543 bytes long.

I tried to understand this PHP code, but I can't get how this works. I think it's a grammar based compression, but I have no clue how one could use undeclared constants like in

<?php range('-', T);

So here is the code. How does this work?

<?=str_replace(range('-',T),split(q,"
I justCannaLE?2Gotta >u=Msta=.q
Ng1Nlet? downNrun<rH=5desMt?N>cryNsayRoodbyeNtE< lie5hurt?q

We'T3n each@Jor s8lSg6r hear9<ch: but6;Lo7hyL7BInsideCe both3Cha9Ro: S
We3KeRa45we;QplBq1)O)NgiT, nPgiT
(GqiT? upq howFJeel:
q knowqmeq<= q
YHq8sqo qt's beenqingq'req aqndqmake? q yHq othMqAqay it
q wqDqellq I'mqGqouqIq fqLhq tqerq
NPq
(OohqeTrQqRSna q gqonqve"),"We; n7trangMsL8loT63Ke rules5s8d8I
AJull commit4nt'sChatFKink: of6CHldn'tRetKisJrom<ny@Ruy-/A= if?<sk 42DS'tLE 4?;Lo8bli=L7ee..
O,R1)O,R001)/-..");

See it working on Ideone.

like image 905
Imateapot Avatar asked Aug 08 '13 13:08

Imateapot


2 Answers

Let's analyze the str_replace parameters one by one.

range('-',T)

The range() function returns an array that has the elements spanning from the first parameter to the second parameter. Characters are considered by their ASCII values, so the result is

Array
(
    [0] => -
    [1] => .
    [2] => /
    [3] => 0
    [4] => 1
    [5] => 2
    [6] => 3
    [7] => 4
    [8] => 5
    [9] => 6
    [10] => 7
    [11] => 8
    [12] => 9
    [13] => :
    [14] => ;
    [15] => <
    [16] => =
    [17] => >
    [18] => ?
    [19] => @
    [20] => A
    [21] => B
    [22] => C
    [23] => D
    [24] => E
    [25] => F
    [26] => G
    [27] => H
    [28] => I
    [29] => J
    [30] => K
    [31] => L
    [32] => M
    [33] => N
    [34] => O
    [35] => P
    [36] => Q
    [37] => R
    [38] => S
    [39] => T
)

Why T instead of "T"? PHP has a misfeature that makes it evaluate undefined constants as strings with the same content as the constant's name. The constant T isn't defined so it's the same as "T" which saves two characters for code golf purposes. The same goes for q later. If the server has error reporting on, it will show a warning about an undefined constant.

split(q,"I justCannaLE?2Gotta >u=Msta=.q...");

This splits the string into an array at the q characters. Again, this makes for shorter code than using an array literal. The result:

Array
(
    [0] => 
I justCannaLE?2Gotta >u=Msta=.
    [1] => 
Ng1Nlet? downNrun<rH=5desMt?N>cryNsayRoodbyeNtE< lie5hurt?
    [2] => 

We'T3n each@Jor s8lSg6r hear9<ch: but6;Lo7hyL7BInsideCe both3Cha9Ro: S
We3KeRa45we;QplB
    [3] => 1)O)NgiT, nPgiT
(G
    [4] => iT? up
    [5] =>  howFJeel:

    [6] =>  know
    [7] => me
    [8] => <= 
    [9] => 
YH
    [10] => 8s
    [11] => o 
    [12] => t's been
    [13] => ing
    [14] => 're
    [15] =>  a
    [16] => nd
    [17] => make? 
    [18] =>  yH
    [19] =>  othM
    [20] => A
    [21] => ay it

    [22] =>  w
    [23] => D
    [24] => ell
    [25] =>  I'm
    [26] => G
    [27] => ou
    [28] => I
    [29] =>  f
    [30] => Lh
    [31] =>  t
    [32] => er
    [33] => 
NP
    [34] => 
(Ooh
    [35] => eTrQ
    [36] => RSna 
    [37] =>  g
    [38] => on
    [39] => ve
)

The final parameter is the target string.

"We; n7trangMsL8loT63Ke rules5s8d8I
AJull commit4nt'sChatFKink: of6CHldn'tRetKisJrom<ny@Ruy-/A= if?<sk 42DS'tLE 4?;Lo8bli=L7ee..
O,R1)O,R001)/-.."

If you pass arrays to str_replace() as needle and haystack, the replacement is done one at a time. For simplicity's sake, let's take just "We; n7trangMs" as the taget string and start replacing from ;. The first step after replacing "7" with "8s" (the corresponding replacement in the second array):

"We; n8strangMs"

Then replace "8" with "o "

"We; no strangMs"

";" with "'re"

"We're no strangMs"

"M" with "er"

"We're no strangers"

In short, it's a basic compression algorithm where you find character sequences that repeat inside the original text and replace them with a single character. When decompressing that character is replaced with the original sequence. By running the progress iteratively you can compress the once-compressed text again ("o s" => "8s" => "7").

like image 84
JJJ Avatar answered Nov 08 '22 11:11

JJJ


Try it!

Undefined constants are assumed to be strings. This is what it looks like with notices enabled:

Notice: Use of undefined constant T - assumed 'T' in D:\www\htdocs\test\index.php on line 1
Notice: Use of undefined constant q - assumed 'q' in D:\www\htdocs\test\index.php on line 1
Deprecated: Function split() is deprecated in D:\www\htdocs\test\index.php on line 12
We're no strangers to love
You know the rules and so do I
[...]
Never gonna say goodbye
Never gonna tell a lie and hurt you
like image 1
Sverri M. Olsen Avatar answered Nov 08 '22 10:11

Sverri M. Olsen