Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating the shortest Turing-complete interpreter [closed]

Python, 250 bytes.

This one is longer than ilya n.'s Python solution, but the language is a little easier to grasp. Each command in this language (I call it Kaputt, the German word for "broken") is one byte. The following 6 commands are defined:

  • 0: Push a zero onto the stack
  • 1: Push a one onto the stack
  • I: (if) Pop a value off the stack (which must be zero or one). Run the following block of code (until "i") if it's a one; skip the block if it's a zero.
  • i: (endif) Ends the if block and pushes a one if the block was not run (because the "I" popped a zero). See below for an explanation of the latter.
  • D: (def) Pops the name of the to-be-defined function off the stack (see below) and binds the following block (until "d") to that name.
  • d: (enddef) Ends the function definition.
  • Any other byte: Check if there is a function by this (one-byte; but see the edit below) name. If so, run this function; if not, push this byte onto the stack for immediate use by D.

Nested ifs are legal; nested function definitions are not. The fact that i (endif) pushes a one if and only if the corresponding if block was not run allows for the following idiom resembling an if/else/endif structure:

# [code that left a one or zero on the stack]
I    # abbreviated "(" below
# [code to be run if it was a one]
0iI  # abbreviated "/" below
# [code to be run if it was a zero]
1iIi # abbreviated ")" below
# [continuing...]

Note that comments, linebreaks, spaces etc. are not actually allowed.

Here are some examples of common functions. These make use of the abbreviations ( / ) mentioned above.

<D(/)d

defines the function < (pop) that pops a value off the stack without using it for anything.

&D((1/0)/<0)d

defines the function & (and) that pops two values of the stack and pushes a one if both values are ones, pushes a zero otherwise.

~D((11/10)/(01/00))d

defines the function ~ (swap) that swaps the top two values on the stack.

RD(R/<)d

defines the function R (remove) that recursively removes all ones from the top of the stack, and then removes two more values (the top one of which should be zero).

The following interpreter function expects the program p, which is a string (or any other iterable of bytes), and the input stack S, which is a (possibly empty) list of bytes. After the interpreter has run, this list contains the output stack.

def i(p,S,F=0):
 A=S.append
 F=F or{}
 C=0
 k=[]
 for c in p:
  h=0in k
  if c=="d":C=0
  elif C!=0:C+=[c]
  elif c=="I":k+=[int(h or S.pop())]
  elif c=="i":k.pop()or A("1")
  elif 1-h:
   if c=="D":C=F[S.pop()]=[]
   else:i(F.get(c)or A(c)or"",S,F)

Obviously, there was no room for error checking, so i() expects legal Kaputt code. Test cases:

script = "<D(/)d"                 # < = pop
script += "&D((1/0)/<0)d"         # & = and
script += "~D((11/10)/(01/00))d"  # ~ = swap
script += "RD(R/<)d"              # R = remove
script += "|D(<1/(1/0))d"         # | = or
script=script.replace("(","I")   
script=script.replace("/","0iI")
script=script.replace(")","1iIi") # ( and / and ) are no real commands of the language.

S=[]
i(script+"1111010111111111R",S)
assert S == ["1","1","1","1","0"]

S=[]
i(script+"00&",S)
assert S == ["0"]

S=[]
i(script+"01~",S)
assert S == ["1","0"]

S=[]
i(script+"00|",S)
assert S == ["0"]

S=[]
i(script+"01|",S)
assert S == ["1"]

Happy coding :-)

Edit: There is nothing inherent in the interpreter that forces tokens to be one byte only. Requiring this was more for consistency with the built-in commands (which are one-byte because that makes the interpreter shorter). If you pass a list of strings instead of a string, you can write more readable Kaputt code like this:

script = """
inc D
    (
        (
            0 0
        /
            1 0
        )
    /
        1
    )
d
""".replace("(","I").replace("/","0 i I").replace(")","1 i I i").split()

This defines a function inc that increments the two-bit number on top of the stack (LSB on top).

Test:

for n in xrange(4):
    S=[]
    i(script + [str((n & 2)/2), str(n & 1), "inc"], S)
    assert S == [str(((n+1) & 2)/2), str((n+1) & 1)]

Let's call multi-byte function names a language extension :-)


Python program that interprets a language I just created:

from random import randint
z = [randint(0,1), None]  # example: input data is 0 or 1
x = '_b_ed_a_i____d_ai'   # this program will set p = data[1] = not data[0]

# input x[i], data z[k]
# jumper j
# p is +1 or -1 each step
# commands:
#    a   set data = 0 or 1
#    b   j += 0 or +-9 depending on data
#    d   move data left or right
#    e   perform jump left or right
#    j   set jumper to 0
#    i   end program
# output: p or some data[x], both possible

g = globals()
p = i = 1
a = b = d = j = e = k = 0
while i:
    h = x[i]
    g[h] = p = -p
    i += 1 + j*e
    if a: z[k] = i % 2
    if z[k]: j += 9*b
    k += d
    g[h] = 0        
#    print(a, b, d, e, i, j, k, h, z)

print('Input:', z, 'Output:', p > 0)

Optimized version:

g=globals()
p=i=1
a=b=d=j=e=k=0
while i:
 h=x[i]
 g[h]=p=-p
 i+=1+j*e
 if a:z[k]=i%2
 if z[k]:j+=9*b
 k+=d
 g[h]=0

114 bytes

Update: I want to add that the point of my program is not to create any practical language, and not even to somehow have the 'best' (=='shortest') interpreter, but rather to demonstrate interesting programming tricks. For example, I am relying on direct access to global variables through globals(), so that I never even test for j command, saving precious bytes :)


#include "/dev/tty"

Assemble the code below using A86 to get a 150 byte BF interpreter!

    add dh,10h
    push dx
    add dh,10h
    push dx
    mov bl,80h
    lea dx,[bx+2]
    add bl,byte ptr [bx]
    mov byte ptr [bx+1],al
    mov ah,3dh
    int 21h
    pop ds,es
    jc l14
    mov bx,ax
    mov ah,3fh  
    mov cx,di
    xor dx,dx
    int 21h
    jc l14
    mov bx,ax
    xor ax,ax
    rep stosw
    xor di,di
    xor si,si
    inc ch
l1:
    cmp si,bx
    jae l14
    lodsb
    mov bp,8
    push l1
l2: 
    dec bp
    js ret
    cmp al,[bp+l15]
    jne l2
l3:
    mov cl,[bp+8+l15]
    jmp cx
l4:
    inc di  
    ret
l5:
    dec di
    ret
l6:
    es inc byte ptr [di]
    ret
l7:
    es dec byte ptr [di]
    ret
l8:
    mov ah,2
    es mov dl,[di]
    int 21h
    ret
l9:
    mov ah,8
    int 21h
    es mov [di],al
    ret
l10:
    es cmp byte ptr [di],dh
    jne ret
l11:    
    cmp si,bx
    jae l14
    lodsb
    cmp al,']'
    jne l11
    ret
l12:
    es cmp byte ptr [di],dh
    je ret
l13:    
    dec si
    jz l14
    cmp byte ptr [si-1],'['
    jne l13
l14:
    ret
l15:
    db '>'
    db '<'
    db '+'
    db '-'
    db '.'
    db ','
    db '['
    db ']'
    db offset l4-100h
    db offset l5-100h
    db offset l6-100h
    db offset l7-100h
    db offset l8-100h
    db offset l9-100h
    db offset l10-100h
    db offset l12-100h

Specify a filename on the command line, no need for double quotes, as the BF source.


Not mine, but take a look at the 210 bit binary lambda calculus self-interpreter here.


Written in Perl, 140 characters long, including the shell invocation and flags:

perl -ne'push@a,split;if(eof){$i=0;while($i>=0){($a,$b,$c)=@a[$i..$i+2];$b>-1?$a[$b]-=$a[$a]:print chr$a[$a];$i=$b>-1&&$a[$b]<=0?$c:$i+3;}}'

Readable version:

#!/usr/bin/perl -n
push @prog, split /\s+/, $_;
if(eof) {
  $i = 0;
  while($i >= 0) {
    ($a, $b, $c) = @prog[$i .. $i + 2];
    if($b > -1) {
      $prog[$b] -= $prog[$a];
    } else {
      print chr $prog[$a];
    }
    if($b > -1 and $prog[$b] <= 0) {
      $i = $c;
    } else {
      $i + 3;
    }
  }
}

The above is an interpreter for pseudo-machine code for a One Instruction Set Computer using the subleq instruction. Basically, the source code should just be a bunch of numbers separated by whitespace. A simple test program to verify my results (should print "Hi" and a Unix newline):

0 0 6 72 105 10 3 -1 9 4 -1 12 5 -1 15 0 0 -1

Readable version of the test input (works just as well):

0    0     6
72   105   10
3   -1     9
4   -1     12
5   -1     15
0    0    -1