Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can prettify.js be extended to support Mathematica?

Preface

Since the Mathematica support for google-code-prettify was mainly developed for the new Mathematica.Stackexchange site, please see also the discussion here.

Introduction

I have no deep knowledge of all of this, but there were times when I wrote a cweb plugin for Idea to have my code highlighted there. In an IDE all this is not a one step process. It is divided into several steps and each step has more highlighting-abilities. Let me explain this a bit to give later some reasons why some things are (imho) not possible for a code-highlighter we need here.

At first the code is split into tokens which are the single parts of a programming language. After this lexer you can categorize intervals of your code into e.g. whitespace, literal, string, comment, and so on. This lexer eats the source-code by testing regular expressions, storing the token-type for a text-span and stepping forward in the code.

After this lexical scan the source-code can be parsed by using the rules of the programming language, the tokens and the underlying code. For instance, if we have a token Plus which is of type Keyword then we know that the brackets and the parameter should follow. If not, the syntax is not correct. What you can build with this parsing is called an AST, abstract syntax tree, and looks basically like the TreeForm of Mathematica syntax.

With a nicely designed language, like Java for instance, it is possible to the check the code while typing and make it almost impossible to write syntactically wrong code.

prettify.js and Mathematica Code

First, the prettify.js implements only a lexical scanner, but no parser. I'm pretty sure, that this would be impossible anyway regarding the time-constrains for displaying a web-page. So let me explain what features are not possible/feasible with prettify.js:

Also, you might notice some of the variables highlighted in orange – I purposefully didn't include that as a requirement, as I think that's going to be a lot harder to do without a parser that knows Mathematica.

Right, because the highlighting of these variables depends on the context. You have to know, that you are inside a Table construct or something like that.

Hacking prettify.js

I think hacking an extension for prettify.js is not so hard. I'm an absolute regular expression noob, so be prepared of what follows.

We don't need so much stuff for a simple Mathematica lexer. We have whitespace, comments, string-literals, braces, a lot of operators, usual literals like variables and a giant list of keywords.

Lets start, with the keywords in java-script regexp-form:

Export["google-code-prettify/keywordsmma.txt", 
   StringJoin @@ Riffle[Apply[StringJoin, 
         Partition[Riffle[Names[RegularExpression["[A-Z].*"]], 
             "|"], 100], {1}], "'+ \n '"], "TEXT"]

The regular expression for whitespace and string-literals can be copied from another language. Comments are matched by something like

/^\(\*[\s\S]*?\*\)/

This runs wrong if we have comments inside comments, but for the moment I don't care. We have braces and brackets

/^(?:\[|\]|{|}|\(|\))/

We have something like blub_boing which should be matched separately.

/^[a-zA-Z$]+[a-zA-Z0-9$]*_+([a-zA-Z$]+[a-zA-Z0-9$]*)*/

We have the slots #, ##, #1, ##9 (currently only one digit can follow)

/^#+[0-9]?/

We have variable names and other literals. They need to start with either a letter or $ and then can follow letters, numbers and $. Currently \[Gamma] is not matched as one literal but for the moment it's ok.

/^[a-zA-Z$]+[a-zA-Z0-9$]*/

And we have operators (I'm not sure this list is complete).

/^(?:\+|\-|\*|\/|,|;|\.|:|@|~|=|\>|\<|&|\||_|`|\^)/

Update

I cleaned the stuff a bit up, did some debugging and created a color-style which looks beautiful to me. The following stuff works as far as I can see correctly:

  • All system symbols which can be found through Names[RegularExpression["[A-Z].*"]] are matched and highlighted in blue
  • Braces and brackets are black but bold font-weight. This was an suggestion from Szabolcs and I like it very much since it definitely add some energy to the appearance of the code
  • Patterns, as they appear in function definitions and the slots of pure functions are highlighted in green. This was suggested by Yoda and goes along with the highlighter in the Mathematica frontend. Patterns are only green in combination with a variable like in blub__Integer, a1_ or in b34_Integer32. Testfunctions for the pattern like in num_?NumericQ are only green infront of the question mark.
  • Comments and Strings have the same color. Comments and strings can go over several lines. Strings can include backslashed quotes. Comments cannot be nested.
  • For the coloring I used consistently the ColorData[1] scheme to ensure colors look nice side by side.

Currently it looks like that:

enter image description here

Testing and debugging

Szabolcs asked whether and how it is possible to test this. This is easy: You need my google-code-prettify source (Where can I put this, so that everyone has access?). Unpack the sources and open the file tests/mathematica_test.html in a webbrowser. This file loads by itself the files src/prettify.js, src/lang-mma.js and src/prettify-mma-1.css.

  • in lang-mma.js you find the regular expression the lexer is using when splitting the code into tokens.
  • in prettify-mma-1.css you find the style definitions I use

To test your own code, simply open mathematica_test.html in an editor and paste your stuff between the pre tags. Reload the page and your code should appear.

Debugging: If the highlighter is not working correctly, you can debug with an IDE or with Google-Chrome. In Chrome you mark the word where the highlighter starts to fail and make right-klick and Inspect Element. What you see then is the underlying html-highlight code. There you can see every single token and you see which type the token is. This looks then like

<span class="tag">[</span>

You see the open bracket is of type tag. This matches with the regexp definition I made in lang-mma.js. In Chrome it is even possible to browse the JS code, set breakpoints and debug it while reloading your page.


Local installation for Google Chrome and Firefox

Tim Stone was so kind to write a script which injects the highlighter during the loading of sites under http://stackoverflow.com/questions/. As soon as google-code-prettify is turned on for mathematica.stackexchange.com it should work there too. I adapted this script to use my lexical scanning rules and colors. I heard that in Firefox the script is not always working, but this is how to install it:

  • Chrome: Follow this link https://github.com/halirutan/Mathematica-Source-Highlighting/raw/master/mathematica-source-highlighter.user.js and you should be prompted whether you want to install this extension.
  • Firefox: ensure you have the Greasemonkey plugin installed. Then download the same link as for Chrome.
  • Now you are set up and when you reload this page, comments, kernel-functions, strings and patterns should be highlighted correctly.

Versions

Under https://github.com/halirutan/Mathematica-Source-Highlighting/raw/master/mathematica-source-highlighter.user.js you will always find the most recent version. Here is some change history.   - 02/23/2013 Updated the lists of symbols and keywords to Mathematica version 9.0.1 - 09/02/2012 some minor issues with the coloring of Mathematica-patterns were fixed. For a detailed overview of features with Pattern-operator : see also the discussion here

  • 02/02/2012 support of many number input formats like .123`10.2 or 1.2`100.3*^-12, highlighting of In[23] and Out[4], ::usage or other messages like blub::boing, highlighting of patterns like ProblemTest[prob:(findp_[pfun_, pvars_, {popts___}, ___]), opts___], bug-fixes (I checked the parser against 3500 lines of package code from the AddOns directory. It took about 3-4 sec to run, which should be more than fast enough for our purposes.)
  • 01/30/2012 Fixed missing '?' in the operator list. Included named-characters like \\[Gamma] to give a complete match for such symbols. Added $variables in the keyword list. Improved the matching of patterns. Added matching of context constructions like Developer`PackedArrayQ. Switch of the color-scheme due to many requests. Now it's like in the Mathematica-frontend. Keywords black, variables blue.
  • 01/29/2012 Tim hacked to injecting code. Now the highlighting works on mathematica.stackexchange too.
  • 01/25/2012 Added the recognition of Mathematica-numbers. This should now highlight things like {1, 1.0, 1., .12, 16^^1.34f, ...}. Additionally it should recognize the backtick behind a number. I switched comments and strings to gray and use a dark red for the numbers.
  • 01/23/2012 Initial version. Capabilities are described under section Update.

Not exactly what you are asking for, but I created a similar extension for MATLAB (based on the excellent work already done here). The project is hosted on github.

The script should solve some of the issues common for MATLAB code on Stack Overflow:

  • comments (no need to use tricks like %# ..)
  • transpose operator (single quote) is correctly recognized as such (confused with quoted strings by the default prettifier)
  • highlighting of popular built-in functions

Keep in mind the syntax highlighting is not perfect; among other things, it fails on nested block comments (I can live with that for now). As always, comments/fixes/issues are welcome.

A separate userscript is included, it allows switching the language used as seen in the screenshot below:

--- before ---

before

--- after ---

after

For those interested, a third userscript is provided, adapted to work on "MATLAB Answers" website.


TL;DR

Install the userscript for SO directly from:

https://github.com/amroamroamro/prettify-matlab/raw/master/js/prettify-matlab.user.js