Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I reverse-engineer a Perl program that has been compiled with perlcc?

I inherited an environment that has a "compiled" perl script on Unix. Is it possible to de-compile, reverse engineer (whatever the term is) it, and obtain the source code from the compiled object code ?

Might not be possible, but thought I'd ask rather than assume.

Thanks, -Kevin.

like image 340
souser Avatar asked Oct 29 '10 02:10

souser


People also ask

Is Perl interpreted or compiled?

Perl is commonly known as an interpreted language, but this is not strictly true. Since the interpreter actually does convert the program into byte code before executing it, it is sometimes called an interpreter/compiler, if anything at all.

Are Perl scripts compiled?

DESCRIPTION. Perl has always had a compiler: your source is compiled into an internal form (a parse tree) which is then optimized before being run.

Does Perl use interpreter?

Perl belongs to a class of programs known as interpreters. This means that when your perl script runs, perl itself must read your commands and carry them out.


2 Answers

Leaving out the bytecode backend tchrist already covered and only talking about the C backend, all perlcc does is translating the optree of your compiled perl program into a C program, which it then compiles. That C program will, when run, then reconstruct that optree into memory, and basically execute it like perl usually would. The point of that is really just to speed up compile time of regular perl code.

That optree of your program is then available in the PL_main_root global variable. We already have a module called B::Deparse, which is able to consume optrees and turn them into source code that's roughly equivalent to the original code that the optree was compiled from. It happens to have a compile method that returns a coderef that'll, when executed, print the deparse result of PL_main_root.

Also there's the C function Perl_eval_pv, which you can use to evaluate Perl snippets from C space.

$ echo 'print 42, "\\n"' > foo.pl
$ perl foo.pl
42
$ perlcc foo.pl
$ ./a.out
42
$ gdb a.out
...
(gdb) b perl_run
Breakpoint 1 at 0x4570e5: file perl.c, line 2213.
(gdb) r
...
Breakpoint 1, perl_run (my_perl=0xa11010) at perl.c:2213
(gdb) p Perl_eval_pv (my_perl, "use B::Deparse; B::Deparse->compile->()", 1)
print 42, "\n";
$1 = (SV *) 0xe47b10

Of course the usual B::Deparse caveats apply, but this will certainly be handy for reverse-engeneering. Actually reconstructing the original source code won't be possible in most cases, even if it worked for the above example.

The exact gdb magic you'll have to do to get B::Deparse to give you something sensible also depends largely on your perl. I'm using a perl with ithreads, and therefore multiplicity. That's why I'm passing around the my_perl variable. Other perls might not need that. Also, if anyone stripped the binary compiled by perlcc, things will get a bit harder, but the same technique will still work.

Also you can use that to compile any optree you can somehow get ahold of at any time during program execution. Have a look at B::Deparse's compile sub and do something similar, except provide it with a B object for whatever optree you want dumped instead of B::main_root.

The same thing applies to the mentioned bytecode backend of perlcc. I'm not entirely sure about the optimized C backend called CC.

like image 133
rafl Avatar answered Nov 15 '22 22:11

rafl


Oh my!

If and only if it was compiled into executable byte code via perlcc -B, you could then uncompile it the same way B::Deparse does. You'd get back all of the source that wasn't optimized away that way. It might look a bit funny, but it would be an equivalent program.

However, if it was fully compiled into C code and thence to assembler and machine language and run through ld for a proper a.out file, you aren't going to be able to do anything like that. It'd be like trying to disassemble /bin/cat.

So ok, you could disassemble it, but there's no joy to be had there. Even if you could get out the original, generated C code — which you cannot — it would be virtually unusable.

I suppose you might running strings(1) on it to see whether anything useful got left lying around somewhere permanent, but I wouldn't count on it.

Sorry.

like image 31
tchrist Avatar answered Nov 15 '22 22:11

tchrist