Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to highlight all R function names with highlight.js?

I want to extend highlight.js capabilities for R language so that (1) all function names that are followed by opening parenthesis ( and (2) all package names that are followed by :: and ::: operators would be highlighted (as it is in RStudio, see Fig.1.). Parentheses (, ) and the operators ::, ::: should not be highlighted.

Fig.1. Desired highlighting. Fig.1. Desired highlighting of R code parts (function and package names).

My example consists of two files: index.html and r.min.js.

HTML file:

<html lang="en-us">
<head> <meta charset="utf-8">
    <link href='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/agate.min.css' rel='stylesheet' type='text/css' />
</head>

<body>

<pre class="r"><code>doc_name &lt;-
    officer::read_docx() %&gt;% 
    flextable:::body_add_flextable(table_to_save) %&gt;% 
    print(target = &quot;word.docx&quot;)

.libPaths()

c("a", "b")

package::function()$field
</code></pre> 

<script src="https://cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/highlight.min.js"></script>
<script src="r.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>

</body>
</html>

r.min.js file:

hljs.registerLanguage("r",function(e){var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";return{c:[e.HCM,{b:r,l:r,k:{keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass ...",literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},r:0},{cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},{cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},{cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},{cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{b:"`",e:"`",r:0},{cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]},

/* My attempt... */
/* ... to highlight function names between double 
and triple colons and opening parenthesis (in red as symbol): */
{cN:"symbol",b:":::|::",e:"\\(",eB:!0,eE:!0},

/* ... to highlight other function names (in red as symbol): */
{cN:"symbol",  b:"([a-zA-Z]|\.[a-zA-Z.])[a-zA-Z0-9._]*",e:"\\(",eE:!0},

/* ... to highlight package names (in cyan as variable): */
{cN:"variable",b:"(?<!\w)",e:":::|::",eE:!0},

]}});

r.min.js is based on (this file) and contains highlight.js rules to identify r code elements. The lines I added are below the comment "My attempt." Meanings of the abbreviations: cN - css class name, b - "beggins", e - "ends", eB - "exclude begin", eE - "exclude end", other meanings are explained here.

The result I get (Fig.2.) is not satisfactory. It seems that regular expressions I use do not find the correct beginnings and ends of desired parts of the R code.

Fig.2. The result using modified <code>r.min.js</code>
Fig.2. The result using modified r.min.js

What should be the correct highlight.js code in r.min.js to get the parts of R code highlighted as in RStudio?

like image 340
GegznaV Avatar asked Jun 30 '18 00:06

GegznaV


1 Answers

Sounds like a worthwhile improvement, so I tinkered for a while with it.

This should be fairly easy,

A regex to capture the package name prefixes could be written like this (demo):

\w+(?=:::?)

and for function names like this (demo):

\.?\w+(?=\()

unfortunately, it is not so easily applied to highlight.js language parsing rules.

After some back and trail and error, I settled with the following code that gives a pretty consistent highlighting:

/* ... to highlight other function names (in orange as a keyword): */
{
    cN: "keyword",
    b: /(^|\s*)(:::?|\.)\w+(?=\(|$)/
},
/* ... to highlight package names (in red as meta): */
{
    cN: "meta",
    b: /(^|\s*)\w+(?=:::?|$)/,
    r: 0
},
  • I use the cN|className keyword for functions this is what it is and it interferes less with the predefined style for functions.
  • The same goes for packages names where I suggest to use the cN meta. This is what other packages use for similar constructs, and again, it gives a more consistent result for built-in styles, e.g. numbers.
  • I've also added print and c to the list of keywords. The list for the R language is obviously somewhat incomplete. Arguably every function name (even from 3rd party packages) should be added as a keyword - this is how some other languages do it - but that's not very practical).

This is what I get.

Sample Code:

hljs.registerLanguage("r",function(e){var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";return{c:[e.HCM,{b:r,l:r,k:{keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass c print ...",literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},r:0},{cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},{cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},{cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},{cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{b:"`",e:"`",r:0},{cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]},
{cN: "keyword", b: /(^|\s*)(:::?|\.)\w+(?=\(|$)/},
{cN: "meta",b: /(^|\s*)\w+(?=:::?|$)/,r: 0 }, ]}});

hljs.initHighlightingOnLoad();
<html lang="en-us">
<head> <meta charset="utf-8"><link href='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/agate.min.css' rel='stylesheet' type='text/css' />
</head><body>

    <pre class="r"><code>library(officer)
doc_name &lt;-
    officer::read_docx() %&gt;% 
    flextable:::body_add_flextable(table_to_save) %&gt;% 
    print(target = &quot;word.docx&quot;)

.libPaths()
x = 4
c("a", "b")

package::function()$field
</code></pre>
<script src="https://cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/highlight.min.js"></script>

</body></html>

Pretty close, but far from being perfect. The main hurdle here is that I struggle to fully understand how the parser interprets the patterns. Some of the results simply make no sense to me but still work.

like image 113
wp78de Avatar answered Nov 15 '22 11:11

wp78de