PHP string console parameters to array

Tags:

I would like to know how I could transform the given string into the specified array:

String

all ("hi there \(option\)", (this, that), other) another

Result wanted (Array)

[0] => all,
[1] => Array(
    [0] => "hi there \(option\)",
    [1] => Array(
        [0] => this,
        [1] => that
    ),
    [2] => other
),
[2] => another

This is used for a kind of console that I'm making on PHP. I tried to use preg_match_all but, I don't know how I could find parentheses inside parentheses in order to "make arrays inside arrays".

EDIT

All other characters that are not specified on the example should be treated as String.

EDIT 2

I forgot to mention that all parameter's outside the parentheses should be detected by the space character.

974

asked Feb 04 '13 10:02

Cristiano Santos

2 Answers

The 10,000ft overview

You need to do this with a small custom parser: code takes input of this form and transforms it to the form you want.

In practice I find it useful to group parsing problems like this in one of three categories based on their complexity:

Trivial: Problems that can be solved with a few loops and humane regular expressions. This category is seductive: if you are even a little unsure if the problem can be solved this way, a good rule of thumb is to decide that it cannot.
Easy: Problems that require building a small parser yourself, but are still simple enough that it doesn't quite make sense to bring out the big guns. If you need to write more than ~100 lines of code then consider escalating to the next category.
Involved: Problems for which it makes sense to go formal and use an already existing, proven parser generator¹.

I classify this particular problem as belonging into the second category, which means that you can approach it like this:

Writing a small parser

Defining the grammar

To do this, you must first define -- at least informally, with a few quick notes -- the grammar that you want to parse. Keep in mind that most grammars are defined recursively at some point. So let's say our grammar is:

The input is a sequence
A sequence is a series series of zero or more tokens
A token is either a word, a string or an array
Tokens are separated by one or more whitespace characters
A word is a sequence of alphabetic characters (a-z)
A string is an arbitrary sequence of characters enclosed within double quotes
An array is a series of one or more tokens separated by commas

You can see that we have recursion in one place: a sequence can contain arrays, and an array is also defined in terms of a sequence (so it can contain more arrays etc).

Treating the matter informally as above is easier as an introduction, but reasoning about grammars is easier if you do it formally.

Building a lexer

With the grammar in hand you know need to break the input down into tokens so that it can be processed. The component that takes user input and converts it to individual pieces defined by the grammar is called a lexer. Lexers are dumb; they are only concerned with the "outside appearance" of the input and do not attempt to check that it actually makes sense.

Here's a simple lexer I wrote to parse the above grammar (don't use this for anything important; may contain bugs):

$input = 'all ("hi there", (this, that) , other) another';

$tokens = array();
$input = trim($input);
while($input) {
    switch (substr($input, 0, 1)) {
        case '"':
            if (!preg_match('/^"([^"]*)"(.*)$/', $input, $matches)) {
                die; // TODO: error: unterminated string
            }

            $tokens[] = array('string', $matches[1]);
            $input = $matches[2];
            break;
        case '(':
            $tokens[] = array('open', null);
            $input = substr($input, 1);
            break;
        case ')':
            $tokens[] = array('close', null);
            $input = substr($input, 1);
            break;
        case ',':
            $tokens[] = array('comma', null);
            $input = substr($input, 1);
            break;
        default:
            list($word, $input) = array_pad(
                preg_split('/(?=[^a-zA-Z])/', $input, 2),
                2,
                null);
            $tokens[] = array('word', $word);
            break;
    }
    $input = trim($input);
}

print_r($tokens);

Building a parser

Having done this, the next step is to build a parser: a component that inspects the lexed input and converts it to the desired format. A parser is smart; in the process of converting the input it also makes sure that the input is well-formed by the grammar's rules.

Parsers are commonly implemented as state machines (also known as finite state machines or finite automata) and work like this:

The parser has a state; this is usually a number in an appropriate range, but each state is also described with a more human-friendly name.
There is a loop that reads reads lexed tokens one at a time. Based on the current state and the value of the token, the parser may decide to do one or more of the following:
1. take some action that affects its output
2. change its state to some other value
3. decide that the input is badly formed and produce an error

¹ Parser generators are programs whose input is a formal grammar and whose output is a lexer and a parser you can "just add water" to: just extend the code to perform "take some action" depending on the type of token; everything else is already taken care of. A quick search on this subject gives led PHP Lexer and Parser Generator?

171

answered Nov 07 '22 04:11

Jon

There's no question that you should write parser if you are building syntax tree. But if you just need to parse this sample input regex still might be a tool:

<?php
$str = 'all, ("hi there", (these, that) , other), another';

$str = preg_replace('/\, /', ',', $str); //get rid off extra spaces
/*
 * get rid off undefined constants with surrounding them with quotes
*/
$str = preg_replace('/(\w+),/', '\'$1\',', $str);
$str = preg_replace('/(\w+)\)/', '\'$1\')', $str);
$str = preg_replace('/,(\w+)/', ',\'$1\'', $str);

$str = str_replace('(', 'array(', $str);

$str = 'array('.$str.');';

echo '<pre>';
eval('$res = '.$str); //eval is evil.
print_r($res); //print the result

Demo.

Note: If input will be malformed regex will definitely fail. I am writing this solution just in a case you need fast script. Writing lexer and parser is time-consuming work, that will need lots of research.

answered Nov 07 '22 02:11

Leri

Related questions
                            
                                How to make a variable private to a trait?
                            
                                Is there a shortcut for inserting PHP's object operator upon code completion in PhpStorm?
                            
                                Overwrite laravel 5 helper function
                            
                                Quickest PHP equivalent of javascript `var a = var1||var2||var3;` expression
                            
                                $_POST not retrieving data from Javascript's Fetch()
                            
                                Set expiry time for laravel jwt dynamically
                            
                                Laravel vagrant up not working - Errno::EADDRNOTAVAIL
                            
                                How to use default value when `null` is given for a nullable function parameter?
                            
                                Private access in inheritance
                            
                                Simple User management example for Google App Engine?
                            
                                Simplest way to get a complete list of all the UTF-8 whitespace characters in PHP
                            
                                How to get error result of unlink
                            
                                Remove all non-matching characters in PHP string?
                            
                                PHP - override existing function [duplicate]
                            
                                How to get an integer from MySQL as integer in PHP?
                            
                                Facebook Javascript vs PHP SDK
                            
                                try/catch doesn't work in PHP
                            
                                PHP Inheritance and Static Methods and Properties
                            
                                .htaccess in subdirectory 'overriding' parent htaccess
                            
                                Run unit test when saving file in PhpStorm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PHP string console parameters to array

Tags:

arrays

regex

php

explode

preg-match