Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Translate algorithmic C to Python

I would like to translate some C code to Python code or bytecode. The C code in question is what i'd call purely algorithmic: platform independent, no I/O, just algorithms and in-memory data structures.

An example would be a regular expression library. Translation tool would process library source code and produce a functionally equivalent Python module that can be run in a sandboxed environment.

What specific approaches, tools and techniques can you recommend?


Note: Python C extension or ctypes is not an option because the environment is sandboxed.

Another note: looks like there is a C-to-Java-bytecode compiler, they even compiled libjpeg to Java. Is Java bytecode+VM too different from CPython bytecode+VM?

like image 357
Constantin Avatar asked Sep 25 '08 09:09

Constantin


2 Answers

There is frankly no way to mechanically and meaningfully translate C to Python without suffering an insane performance penalty. As we all know Python isn't anywhere near C speed (with current compilers and interpreters) but worse than that is that what C is good at (bit-fiddling, integer math, tricks with blocks of memory) Python is very slow at, and what Python is good at you can't express in C directly. A direct translation would therefore be extra inefficient, to the point of absurdity.

The much, much better approach in general is indeed to keep the C the C, and wrap it in a Python extension module (using SWIG, Pyrex, Cython or writing a wrapper manually) or call the C library directly using ctypes. All the benefits (and downsides) of C for what's already C or you add later, and all the convenience (and downsides) of Python for any code in Python.

That won't satisfy your 'sandboxing' needs, but you should realize that you cannot sandbox Python particularly well anyway; it takes a lot of effort and modification of CPython, and if you forget one little hole somewhere your jail is broken. If you want to sandbox Python you should start by sandboxing the entire process, and then C extensions can get sandboxed too.

like image 127
Thomas Wouters Avatar answered Nov 01 '22 22:11

Thomas Wouters


use indent(1) and ctopy(1)... for extra credit test speeds on pypy... for bonus credit use pyastra to generate assembly code.

Regardless of language you will always have to sacrifice storing outputs of various constructs and functions between run-time space (CPU) or memory-space (RAM).

Check the great language shootout if you want to see what I'm talking about either way this is too much comp sci snobbery...

Here is an example, want to use floating point math without using floating point numbers?

x * 1,000,000 = a
y * 1,000,000 = b
a {function} b = result
result / 1,000,000 = z

Don't get bogged down, get primal, use caveman math if you have to.

like image 30
John S Avatar answered Nov 01 '22 23:11

John S