Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build a tokenizer in PHP?

Tags:

php

tokenize

I'm building a site to learn basic programming, I'm going to use a pseudolanguage in which users can submit their code and I need to interpret it. However I'm not sure how to build a tokenizer in PHP.

Having a snippet such as this one:

a = 1
b = 2
c = a - b

if(a > b) {
    buy(a)
    } else {
    buy(b)
    }

How would I go about separating this code into tokens?

--

This is what I'm trying now:

$tokens = array();

// First token (define string)
$token  = strtok($botCode, '=');
$tokens[] = $token;

// Loop
while($token) {
    $token  = strtok('=');
    $tokens[] = $token;
}

However I haven't been able to figure out how to use strtok with a list of regular expresions... I could do something similar to strtok but that accepts arrays as needles with substr and strrpos but it seems to me that it should be possible to do it with strtok as it's designed just for this. Any info or pointing in the right direction will be thanked

like image 233
lisovaccaro Avatar asked Feb 21 '13 03:02

lisovaccaro


People also ask

What is PHP tokenizer?

Tokenizer ¶ PhpToken::is — Tells whether the token is of given kind. PhpToken::isIgnorable — Tells whether the token would be ignored by the PHP parser. PhpToken::__toString — Returns the textual content of the token. PhpToken::tokenize — Splits given source into PHP tokens, represented by PhpToken objects.

What is tokenizer HTML?

If you are unfamiliar with that word 'tokenize', it's simply the process of breaking a stream of characters into discrete tokens defined by the particular grammar—in this case, HTML. The tokens in HTML are start-tag ( <tag> ), self-closing tag ( <tag/> ), end-tag ( </tag> ), and plain text content within an element.

What is a tokenizer programming?

The tokenizer is responsible for dividing the input stream into individual tokens, identifying the token type, and passing tokens one at a time to the next stage of the compiler. The next stage of the compiler is called the Parser. This part of the compiler has an understanding of the language's grammar.


1 Answers

Do not wait some magic from strtok. It is similar to preg_split.

I think that you want to build your own lexer. So you could use article Writing a simple lexer in PHP or something else.

like image 159
sectus Avatar answered Sep 28 '22 20:09

sectus