Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validate user inputted PHP code before passing it to eval()

Tags:

php

Before passing a string to eval() I would like to make sure the syntax is correct and allow:

  1. Two functions: a() and b()
  2. Four operators: /*-+
  3. Brackets: ()
  4. Numbers: 1.2, -1, 1

How can I do this, maybe it has something to do with PHP Tokenizer?

I'm actually trying to make a simple formula interpreter so a() and b() will be replaced by ln() and exp(). I don't want to write a tokenizer and parser from scratch.

like image 742
hidarikani Avatar asked Aug 08 '11 09:08

hidarikani


People also ask

How do you evaluate in PHP?

The eval() function evaluates a string as PHP code. The string must be valid PHP code and must end with semicolon. Note: A return statement will terminate the evaluation of the string immediately. Tip: This function can be useful for storing PHP code in a database.

What is data validation in PHP?

Validation means check the input submitted by the user. There are two types of validation are available in PHP. They are as follows − Client-Side Validation − Validation is performed on the client machine web browsers.


2 Answers

As far as validation is concerned, the following character tokens are valid:

operator: [/*+-]
funcs:    (a\(|b\()
brackets: [()]
numbers:  \d+(\.\d+)?
space:    [ ]

A simple validation could then check if the input string matches any combination of these patterns. Because the funcs token is pretty precise and it does not clash much with other tokens, this validation should be quite stable w/o the need implementing any syntax/grammar already:

$tokens = array(
    'operator' => '[/*+-]',
    'funcs' => '(a\(|b\()',
    'brackets' => '[()]', 
    'numbers' => '\d+(\.\d+)?',
    'space' => '[ ]',
);

$pattern = '';
foreach($tokens as $token)
{
    $pattern .= sprintf('|(?:%s)', $token);
}
$pattern = sprintf('~^(%s)*$~', ltrim($pattern, '|'));

echo $pattern;

Only if the whole input string matches against the token based pattern, it validates. It still might be syntactically wrong PHP, put you can ensure it only is build upon the specified tokens:

~^((?:[/*+-])|(?:(a\(|b\())|(?:[()])|(?:\d+(\.\d+)?)|(?:[ ]))*$~

If you build the pattern dynamically - as in the example - you're able to modify your language tokens later on more easily.

Additionally this can be the first step to your own tokenizer / lexer. The token stream can then passed on to a parser which can syntactically validate and interpret it. That's the part user187291 wrote about.

Alternatively to writing a full lexer+parser, and you need to validate the syntax, you can formulate your grammar based on tokens as well and then do a regex based token grammar on the token representation of the input.

The tokens are the words you use in your grammar. You will need to describe parenthesis and function definition more precisely then in tokens, and the tokenizer should follow more clear rules which token supersedes another token. The concept is outlined in another question of mine. It uses regex as well for grammar formulation and syntax validation, but it still does not parse. In your case eval would be the parser you're making use of.

like image 123
hakre Avatar answered Sep 29 '22 01:09

hakre


Parser generators have indeed already been written for PHP, and "LIME" in particular comes with the typical "calculator" example, which would be an obvious starting point for your "mini language": http://sourceforge.net/projects/lime-php/

It's been years since I last played with LIME, but it was already mature & stable then.

Notes:

1) Using a full-on parser generator gives you the advantage of avoiding PHP eval() entirely if you wish - you can make LIME emit a parser which effectively provides an "eval" function for expressions written in your mini language (with validation baked in). This gives you the additional advantage of allowing you to add support for new functions, as needed.

2) It may seem like overkill at first to use a parser generator for such an apparently small task, but once you get the examples working you'll be impressed by how easy it is to modify and extend them. And it's very easy to underestimate the difficulty of writing a bug-free parser (even a "trivial" one) from scratch.

like image 24
Peter Avatar answered Sep 29 '22 00:09

Peter