Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing CSS by regex

I'm creating a CSS editor and am trying to create a regular expression that can get data from a CSS document. This regex works if I have one property but I can't get it to work for all properties. I'm using preg/perl syntax in PHP.

Regex

(?<selector>[A-Za-z]+[\s]*)[\s]*{[\s]*((?<properties>[A-Za-z0-9-_]+)[\s]*:[\s]*(?<values>[A-Za-z0-9#, ]+);[\s]*)*[\s]*}

Test case

body { background: #f00; font: 12px Arial; }

Expected Outcome

Array(
    [0] => Array(
            [0] => body { background: #f00; font: 12px Arial; }
            [selector] => Array(
                [0] => body
            )
            [1] => Array(
                [0] => body
            )
            [2] => font: 12px Arial; 
            [properties] => Array(
                [0] => font
            )
            [3] => Array(
                [0] => font
            )
            [values] => Array(
                [0] => 12px Arial
                [1] => background: #f00
            )
            [4] => Array(
                [0] => 12px Arial
                [1] => background: #f00
            )
        )
)

Real Outcome

Array(
    [0] => Array
        (
            [0] => body { background: #f00; font: 12px Arial; }
            [selector] => body 
            [1] => body 
            [2] => font: 12px Arial; 
            [properties] => font
            [3] => font
            [values] => 12px Arial
            [4] => 12px Arial
        )
    )

Thanks in advance for any help - this has been confusing me all afternoon!

like image 848
Ross Avatar asked Oct 25 '08 20:10

Ross


1 Answers

You are trying to pull structure out of the data, and not just individual values. Regular expressions might could be painfully stretched to do the job, but you are really entering parser territory, and should be pulling out the big guns, namely parsers.

I have never used the PHP parser generating tools, but they look okay after a light scan of the docs. Check out LexerGenerator and ParserGenerator. LexerGenerator will take a bunch of regular expressions describing the different types of tokens in a language (in this case, CSS) and spit out some code that recognizes the individual tokens. ParserGenerator will take a grammar, a description of what things in a language are made up of what other things, and spit out a parser, code that takes a bunch of tokens and returns a syntax tree (the data structure that you are after.

like image 135
Andru Luvisi Avatar answered Oct 05 '22 02:10

Andru Luvisi