Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP syntax highlighting [closed]

I'm searching for a PHP syntax highlighting engine that can be customized (i.e. I can provide my own tokenizers for new languages) and that can handle several languages simultaneously (i.e. on the same output page). This engine has to work well together with CSS classes, i.e. it should format the output by inserting <span> elements that are adorned with class attributes. Bonus points for an extensible schema.

I do not search for a client-side syntax highlighting script (JavaScript).

So far, I'm stuck with GeSHi. Unfortunately, GeSHi fails abysmally for several reasons. The main reason is that the different language files define completely different, inconsistent styles. I've worked hours trying to refactor the different language definitions down to a common denominator but since most definition files are in themselves quite bad, I'd finally like to switch.

Ideally, I'd like to have an API similar to CodeRay, Pygments or the JavaScript dp.SyntaxHighlighter.

Clarification:

I'm looking for a code highlighting software written in PHP, not for PHP (since I need to use it from inside PHP).

like image 716
Konrad Rudolph Avatar asked Oct 23 '08 15:10

Konrad Rudolph


People also ask

What is PHP file syntax highlighting?

The Syntax highlighting feature makes PHP code more readable as any other language file integrated in Visual Studio. Script code blocks, variables, keywords, strings, script tags and comments are distinguished to make programming easier.

How do I highlight in PHP?

The highlight_string() function outputs a string with the PHP syntax highlighted. The string is highlighted by using HTML tags. The colors used for syntax highlighting can be set in the php. ini file or with the ini_set() function.


4 Answers

Since no existing tool satisfied my needs, I wrote my own. Lo and behold:

Hyperlight

Usage is extremely easy: just use

 <?php hyperlight($code, 'php'); ?> 

to highlight code. Writing new language definitions is relatively easy, too – using regular expressions and a powerful but simple state machine. By the way, I still need a lot of definitions so feel free to contribute.

like image 61
Konrad Rudolph Avatar answered Sep 28 '22 09:09

Konrad Rudolph


[I marked this answer as Community Wiki because you're specifically not looking for Javascript]

http://softwaremaniacs.org/soft/highlight/ is a PHP (plus the following list of other languages supported) syntax highlighting library:

Python, Ruby, Perl, PHP, XML, HTML, CSS, Django, Javascript, VBScript, Delphi, Java, C++, C#, Lisp, RenderMan (RSL and RIB), Maya Embedded Language, SQL, SmallTalk, Axapta, 1C, Ini, Diff, DOS .bat, Bash

It uses <span class="keyword"> style markup.

It has also been integrated in the dojo toolkit (as a dojox project: dojox.lang.highlight)

Though not the most popular way to run a webserver, strictly speaking, Javascript is not only implemented on the client-side, but there are also Server-Side Javascript engine/platform combinations too.

like image 27
micahwittman Avatar answered Sep 28 '22 08:09

micahwittman


I found this simple generic syntax highlighter written in PHP here and modified it a bit:

<?php

/**
 * Original => http://phoboslab.org/log/2007/08/generic-syntax-highlighting-with-regular-expressions
 * Usage => `echo SyntaxHighlight::process('source code here');`
 */

class SyntaxHighlight {
    public static function process($s) {
        $s = htmlspecialchars($s);

        // Workaround for escaped backslashes
        $s = str_replace('\\\\','\\\\<e>', $s); 

        $regexp = array(

            // Comments/Strings
            '/(
                \/\*.*?\*\/|
                \/\/.*?\n|
                \#.[^a-fA-F0-9]+?\n|
                \&lt;\!\-\-[\s\S]+\-\-\&gt;|
                (?<!\\\)&quot;.*?(?<!\\\)&quot;|
                (?<!\\\)\'(.*?)(?<!\\\)\'
            )/isex' 
            => 'self::replaceId($tokens,\'$1\')',

            // Punctuations
            '/([\-\!\%\^\*\(\)\+\|\~\=\`\{\}\[\]\:\"\'<>\?\,\.\/]+)/'
            => '<span class="P">$1</span>',

            // Numbers (also look for Hex)
            '/(?<!\w)(
                (0x|\#)[\da-f]+|
                \d+|
                \d+(px|em|cm|mm|rem|s|\%)
            )(?!\w)/ix'
            => '<span class="N">$1</span>',

            // Make the bold assumption that an
            // all uppercase word has a special meaning
            '/(?<!\w|>|\#)(
                [A-Z_0-9]{2,}
            )(?!\w)/x'
            => '<span class="D">$1</span>',

            // Keywords
            '/(?<!\w|\$|\%|\@|>)(
                and|or|xor|for|do|while|foreach|as|return|die|exit|if|then|else|
                elseif|new|delete|try|throw|catch|finally|class|function|string|
                array|object|resource|var|bool|boolean|int|integer|float|double|
                real|string|array|global|const|static|public|private|protected|
                published|extends|switch|true|false|null|void|this|self|struct|
                char|signed|unsigned|short|long
            )(?!\w|=")/ix'
            => '<span class="K">$1</span>',

            // PHP/Perl-Style Vars: $var, %var, @var
            '/(?<!\w)(
                (\$|\%|\@)(\-&gt;|\w)+
            )(?!\w)/ix'
            => '<span class="V">$1</span>'

        );

        $tokens = array(); // This array will be filled from the regexp-callback

        $s = preg_replace(array_keys($regexp), array_values($regexp), $s);

        // Paste the comments and strings back in again
        $s = str_replace(array_keys($tokens), array_values($tokens), $s);

        // Delete the "Escaped Backslash Workaround Token" (TM)
        // and replace tabs with four spaces.
        $s = str_replace(array('<e>', "\t"), array('', '    '), $s);

        return '<pre><code>' . $s . '</code></pre>';
    }

    // Regexp-Callback to replace every comment or string with a uniqid and save
    // the matched text in an array
    // This way, strings and comments will be stripped out and wont be processed
    // by the other expressions searching for keywords etc.
    private static function replaceId(&$a, $match) {
        $id = "##r" . uniqid() . "##";

        // String or Comment?
        if(substr($match, 0, 2) == '//' || substr($match, 0, 2) == '/*' || substr($match, 0, 2) == '##' || substr($match, 0, 7) == '&lt;!--') {
            $a[$id] = '<span class="C">' . $match . '</span>';
        } else {
            $a[$id] = '<span class="S">' . $match . '</span>';
        }
        return $id;
    }
}

?>

Demo: http://phpfiddle.org/lite/code/1sf-htn


Update

I just created a PHP port of my own JavaScript generic syntax highlighter here → https://github.com/taufik-nurrohman/generic-syntax-highlighter/blob/master/generic-syntax-highlighter.php

How to use:

<?php require 'generic-syntax-highlighter.php'; ?>
<pre><code><?php echo SH('&lt;div class="foo"&gt;&lt;/div&gt;'); ?></code></pre>
like image 42
Taufik Nurrohman Avatar answered Sep 28 '22 10:09

Taufik Nurrohman


It might be worth looking at Pear_TextHighlighter (documentation)

I think it won't by default output html exactly how you want it, but it does provide extensive capabilities for customisation (i.e. you can create different renderers/parsers)

like image 35
Tom Haigh Avatar answered Sep 28 '22 09:09

Tom Haigh