Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does this program duplicate itself?

Tags:

c

quine

This code is from Hacker's Delight. It says this is the shortest such program in C and is 64 characters in length, but I don't understand it:

    main(a){printf(a,34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);}

I tried to compile it. It compiles with 3 warnings and no error.

like image 927
PDP Avatar asked Apr 24 '15 01:04

PDP


3 Answers

This program relies upon the assumptions that

  • return type of main is int
  • function's parameter type is int by default and
  • the argument a="main(a){printf(a,34,a=%c%s%c,34);}" will be evaluated first.

It will invoke undefined behavior. Order of evaluation of arguments of a function is not guaranteed in C.
Albeit, this program works as follows:

The assignment expression a="main(a){printf(a,34,a=%c%s%c,34);}" will assign the string "main(a){printf(a,34,a=%c%s%c,34);}" to a and the value of the assignment expression would be "main(a){printf(a,34,a=%c%s%c,34);}" too as per C standard --C11: 6.5.16

An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment [...]

Taking in mind the above semantic of assignment operator the program will be expanded as

 main(a){
      printf("main(a){printf(a,34,a=%c%s%c,34);}",34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);
}  

ASCII 34 is ". Specifiers and its corresponding arguments:

%c ---> 34 
%s ---> "main(a){printf(a,34,a=%c%s%c,34);}" 
%c ---> 34  

A better version would be

main(a){a="main(a){a=%c%s%c;printf(a,34,a,34);}";printf(a,34,a,34);}  

It is 4 character longer but at least follows K&R C.

like image 160
haccks Avatar answered Oct 17 '22 21:10

haccks


It relies on several quirks of the C language and (what I think is) undefined behavior.

First, it defines the main function. It is legal to declare a function without a return type or parameter types, and they will be presumed to be int. This is why the main(a){ part works.

Then, it calls printf with 4 parameters. Since it has no prototype, it is assumed to return int and accept int parameters (unless your compiler implicitly declares it otherwise, like Clang does).

The first parameter is presumed int and is argc at the beginning of the program. The second parameter is 34 (which is ASCII for the double-quote character). The third parameter is an assignment expression that assigns the format string to a and returns it. It relies on a pointer-to-int conversion, which is legal in C. The last parameter is another quote character in numeric form.

At runtime, the %c format specifiers are substituted with quotes, the %s is substituted with the format string, and you get the original source again.

As far as I know, the order of argument evaluation is undefined. This quine works because the assignment a="main(a){printf(a,34,a=%c%s%c,34);}" is evaluated before a is passed as the first parameter to printf, but as far as I know, there is no rule to enforce it. Additionally, this can't work on 64-bit platforms because the pointer-to-int conversion will truncate the pointer to a 32-bit value. As a matter of fact, even though I can see how it works on some platforms, it doesn't work on my computer with my compiler.

like image 25
zneak Avatar answered Oct 17 '22 20:10

zneak


This works based on lots of quirks that C allows you to do, and some undefined behavior that happens to work in your favor. In order:

main(a) { ...

Types are assumed to be int if unspecified, so this is equivalent to:

int main(int a) { ...

Even though main is supposed to take either 0 or 2 arguments, and this is undefined behavior, this can be allowed as just ignoring the missing second argument.

Next, the body, which I will space out. Note that a is an int as per main:

printf(a,
       34,
       a = "main(a){printf(a,34,a=%c%s%c,34);}",
       34);

The order of evaluation of arguments is undefined, but we're relying on the 3rd argument - the assignment - getting evaluated first. We're also relying on the undefined behavior of being able to assign a char * to an int. Also, note that 34 is the ASCII value of ". Thus, the intended impact of the program is:

int main(int a, char** ) {
    printf("main(a){printf(a,34,a=%c%s%c,34);}",
           '"',
           "main(a){printf(a,34,a=%c%s%c,34);}",
           '"');
    return 0; // also left off
}

Which, when evaluated, produces:

main(a){printf(a,34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);}

which was the original program. Tada!

like image 4
Barry Avatar answered Oct 17 '22 21:10

Barry