Automatically parsing PHP to separate PHP code from HTML

Tags:

I'm working on a large PHP code base; I'd like to separate the PHP code from the HTML and JavaScript. (I need to do several automatic search-and-replaces on the PHP code, and different ones on the HTML, and different on the JS). Is there a good parser engine that could separate out the PHP for me? I could do this using regular expressions, but they're not perfect. I could build something in ANTLR, perhaps, but a good already existing solution would be best.

I should make clear: I don't want or need a full PHP parser. Just need to know if a given token is: - PHP code - PHP single quote string - PHP double quote string - PHP Comment - Not PHP, but rather HTML/JavaScript

248

asked Nov 07 '10 17:11

SRobertJames

2 Answers

How about the tokenizer built right into PHP itself?

The tokenizer functions provide an interface to the PHP tokenizer embedded in the Zend Engine. Using these functions you may write your own PHP source analyzing or modification tools without having to deal with the language specification at the lexical level.

You ask in the comments whether you can regenerate the code from the tokenized output - yet you can, all whitespace is preserved as T_WHITESPACE tokens. Here's how you might turn the tokenized output back into code:

$regenerated='';

$tokens = token_get_all($code);
foreach($tokens as $idx=>$t)
{
    if (is_array($t))
    {

         //do something with string and comments here?
         switch($t[0])
         {
             case T_CONSTANT_ENCAPSED_STRING:
                  break;
             case T_COMMENT:
             case T_DOC_COMMENT:
                 break;

         }
         $regenerated.=$t[1];


    }
    else
    {
         $regenerated.=$t;
    }
}

113

answered Oct 21 '22 22:10

Paul Dixon

To separate the PHP from the rest, PHP's inbuilt tokenizer is your best choice: See token_get_all()

For the rest, you might be best off with a DOM parser. Isolating the <script> parts (and external scripts, and even onXXXX events) is easy that way.

It might be tough to re-build the identical document from a parsed DOM tree, though - I guess it depends on what you need to do with the results and how clean the original HTML is. A regular expression (yuck!) could work better for that part.

answered Oct 21 '22 22:10

Pekka

Related questions
                            
                                PHP variable is not working in a Wordpress header and index file?
                            
                                Interprocess Communication using Named Pipes in C# + PHP
                            
                                Grant User One Point Each Day
                            
                                Should GD need so much memory when resizing?
                            
                                How would I use PHP to authenticate with Linux users and passwords?
                            
                                PHP invoking another script but through http (isolating them)
                            
                                Find h3 and h4 tags beneath it
                            
                                How to read multiple worksheet from a single excel file in php?
                            
                                Valid characters in "From:" display name for emails
                            
                                zend view: bootstrap(view) or bootstrap(layout)
                            
                                Multiple discrimination levels while using Doctrine2
                            
                                MongoDB PHP: How do I get ObjectId with a JSON feed? (it's blank)
                            
                                Breakpoints are completely ignored when debugging CakePHP application
                            
                                Nested Set Model Php library
                            
                                Best/Better/Optimal way to setup a Staging/Development server
                            
                                How to get Drupal's $base_url to work on a cron job?
                            
                                Html in my database!
                            
                                How to merge two arrays of object in PHP
                            
                                Implementing a voting system without requiring registration
                            
                                Convert string into slug with single-hyphen delimiters only

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Automatically parsing PHP to separate PHP code from HTML

Tags:

php

parsing

code-generation

antlr

SRobertJames

People also ask

2 Answers

Paul Dixon

Pekka

Recent Activity

Donate For Us