Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the Brainfuck Hello World actually work?

Someone sent this to me and claimed it is a hello world in Brainfuck (and I hope so...)

++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.

I know the basics that it works by moving a pointer and increment and decrementing stuff...

Yet I still want to know, how does it actually work? How does it print anything on the screen in the first place? How does it encode the text? I do not understand at all...

like image 469
speeder Avatar asked Oct 10 '22 02:10

speeder


2 Answers

1. Basics

To understand Brainfuck you must imagine infinite array of cells initialized by 0 each.

...[0][0][0][0][0]...

When brainfuck program starts, it points to any cell.

...[0][0][*0*][0][0]...

If you move pointer right > you are moving pointer from cell X to cell X+1

...[0][0][0][*0*][0]...

If you increase cell value + you get:

...[0][0][0][*1*][0]...

If you increase cell value again + you get:

...[0][0][0][*2*][0]...

If you decrease cell value - you get:

...[0][0][0][*1*][0]...

If you move pointer left < you are moving pointer from cell X to cell X-1

...[0][0][*0*][1][0]...

2. Input

To read character you use comma ,. What it does is: Read character from standard input and write its decimal ASCII code to the actual cell.

Take a look at ASCII table. For example, decimal code of ! is 33, while a is 97.

Well, lets imagine your BF program memory looks like:

...[0][0][*0*][0][0]...

Assuming standard input stands for a, if you use comma , operator, what BF does is read a decimal ASCII code 97 to memory:

...[0][0][*97*][0][0]...

You generally want to think that way, however the truth is a bit more complex. The truth is BF does not read a character but a byte (whatever that byte is). Let me show you example:

In linux

$ printf ł

prints:

ł

which is specific polish character. This character is not encoded by ASCII encoding. In this case it's UTF-8 encoding, so it used to take more than one byte in computer memory. We can prove it by making a hexadecimal dump:

$ printf ł | hd

which shows:

00000000  c5 82                                             |..|

Zeroes are offset. 82 is first and c5 is second byte representing ł (in order we will read them). |..| is graphical representation which is not possible in this case.

Well, if you pass ł as input to your BF program that reads single byte, program memory will look like:

...[0][0][*197*][0][0]...

Why 197 ? Well 197 decimal is c5 hexadecimal. Seems familiar ? Of course. It's first byte of ł !

3. Output

To print character you use dot . What it does is: Assuming we treat actual cell value like decimal ASCII code, print corresponding character to standard output.

Well, lets imagine your BF program memory looks like:

...[0][0][*97*][0][0]...

If you use dot (.) operator now, what BF does is print:

a

Because a decimal code in ASCII is 97.

So for example BF program like this (97 pluses 2 dots):

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++..

Will increase value of the cell it points to up to 97 and print it out 2 times.

aa

4. Loops

In BF loop consists of loop begin [ and loop end ]. You can think it's like while in C/C++ where the condition is actual cell value.

Take a look BF program below:

++[]

++ increments actual cell value twice:

...[0][0][*2*][0][0]...

And [] is like while(2) {}, so it's infinite loop.

Let's say we don't want this loop to be infinite. We can do for example:

++[-]

So each time a loop loops it decrements actual cell value. Once actual cell value is 0 loop ends:

...[0][0][*2*][0][0]...        loop starts
...[0][0][*1*][0][0]...        after first iteration
...[0][0][*0*][0][0]...        after second iteration (loop ends)

Let's consider yet another example of finite loop:

++[>]

This example shows, we haven't to finish loop at cell that loop started on:

...[0][0][*2*][0][0]...        loop starts
...[0][0][2][*0*][0]...        after first iteration (loop ends)

However it is good practice to end where we started. Why ? Because if loop ends another cell it started, we can't assume where the cell pointer will be. To be honest, this practice makes brainfuck less brainfuck.

like image 292
Scony Avatar answered Oct 11 '22 16:10

Scony


Wikipedia has a commented version of the code.

+++++ +++++             initialize counter (cell #0) to 10
[                       use loop to set the next four cells to 70/100/30/10
    > +++++ ++              add  7 to cell #1
    > +++++ +++++           add 10 to cell #2 
    > +++                   add  3 to cell #3
    > +                     add  1 to cell #4
    <<<< -                  decrement counter (cell #0)
]                   
> ++ .                  print 'H'
> + .                   print 'e'
+++++ ++ .              print 'l'
.                       print 'l'
+++ .                   print 'o'
> ++ .                  print ' '
<< +++++ +++++ +++++ .  print 'W'
> .                     print 'o'
+++ .                   print 'r'
----- - .               print 'l'
----- --- .             print 'd'
> + .                   print '!'
> .                     print '\n'

To answer your questions, the , and . characters are used for I/O. The text is ASCII.

The Wikipedia article goes on in some more depth, as well.

The first line initialises a[0] = 10 by simply incrementing ten times from 0. The loop from line 2 effectively sets the initial values for the array: a[1] = 70 (close to 72, the ASCII code for the character 'H'), a[2] = 100 (close to 101 or 'e'), a[3] = 30 (close to 32, the code for space) and a[4] = 10 (newline). The loop works by adding 7, 10, 3, and 1, to cells a[1], a[2], a[3] and a[4] respectively each time through the loop - 10 additions for each cell in total (giving a[1]=70 etc.). After the loop is finished, a[0] is zero. >++. then moves the pointer to a[1], which holds 70, adds two to it (producing 72, which is the ASCII character code of a capital H), and outputs it.

The next line moves the array pointer to a[2] and adds one to it, producing 101, a lower-case 'e', which is then output.

As 'l' happens to be the seventh letter after 'e', to output 'll' another seven are added (+++++++) to a[2] and the result is output twice.

'o' is the third letter after 'l', so a[2] is incremented three more times and output the result.

The rest of the program goes on in the same way. For the space and capital letters, different array cells are selected and incremented or decremented as needed.

like image 56
ken Avatar answered Oct 11 '22 14:10

ken