I want to extend highlight.js
capabilities for R
language so that (1) all function names that are followed by opening parenthesis (
and (2) all package names that are followed by ::
and :::
operators would be highlighted (as it is in RStudio, see Fig.1.). Parentheses (
, )
and the operators ::
, :::
should not be highlighted.
Fig.1. Desired highlighting of R
code parts (function and package names).
My example consists of two files: index.html
and r.min.js
.
HTML file:
<html lang="en-us">
<head> <meta charset="utf-8">
<link href='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/agate.min.css' rel='stylesheet' type='text/css' />
</head>
<body>
<pre class="r"><code>doc_name <-
officer::read_docx() %>%
flextable:::body_add_flextable(table_to_save) %>%
print(target = "word.docx")
.libPaths()
c("a", "b")
package::function()$field
</code></pre>
<script src="https://cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/highlight.min.js"></script>
<script src="r.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</body>
</html>
r.min.js
file:
hljs.registerLanguage("r",function(e){var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";return{c:[e.HCM,{b:r,l:r,k:{keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass ...",literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},r:0},{cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},{cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},{cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},{cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{b:"`",e:"`",r:0},{cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]},
/* My attempt... */
/* ... to highlight function names between double
and triple colons and opening parenthesis (in red as symbol): */
{cN:"symbol",b:":::|::",e:"\\(",eB:!0,eE:!0},
/* ... to highlight other function names (in red as symbol): */
{cN:"symbol", b:"([a-zA-Z]|\.[a-zA-Z.])[a-zA-Z0-9._]*",e:"\\(",eE:!0},
/* ... to highlight package names (in cyan as variable): */
{cN:"variable",b:"(?<!\w)",e:":::|::",eE:!0},
]}});
r.min.js
is based on (this file) and contains highlight.js
rules to identify r
code elements.
The lines I added are below the comment "My attempt." Meanings of the abbreviations: cN
- css class name, b
- "beggins", e
- "ends", eB
- "exclude begin", eE
- "exclude end", other meanings are explained here.
The result I get (Fig.2.) is not satisfactory. It seems that regular expressions I use do not find the correct beginnings and ends of desired parts of the R
code.
Fig.2. The result using modified r.min.js
What should be the correct highlight.js
code in r.min.js
to get the parts of R
code highlighted as in RStudio?
Sounds like a worthwhile improvement, so I tinkered for a while with it.
This should be fairly easy,
A regex to capture the package name prefixes could be written like this (demo):
\w+(?=:::?)
and for function names like this (demo):
\.?\w+(?=\()
unfortunately, it is not so easily applied to highlight.js language parsing rules.
After some back and trail and error, I settled with the following code that gives a pretty consistent highlighting:
/* ... to highlight other function names (in orange as a keyword): */
{
cN: "keyword",
b: /(^|\s*)(:::?|\.)\w+(?=\(|$)/
},
/* ... to highlight package names (in red as meta): */
{
cN: "meta",
b: /(^|\s*)\w+(?=:::?|$)/,
r: 0
},
keyword
for functions this is what it is and it interferes less with the predefined style for functions. meta
. This is what other packages use for similar constructs, and again, it gives a more consistent result for built-in styles, e.g. numbers.print
and c
to the list of keywords. The list for the R language is obviously somewhat incomplete. Arguably every function name (even from 3rd party packages) should be added as a keyword - this is how some other languages do it - but that's not very practical). This is what I get.
Sample Code:
hljs.registerLanguage("r",function(e){var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";return{c:[e.HCM,{b:r,l:r,k:{keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass c print ...",literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},r:0},{cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},{cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},{cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},{cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{b:"`",e:"`",r:0},{cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]},
{cN: "keyword", b: /(^|\s*)(:::?|\.)\w+(?=\(|$)/},
{cN: "meta",b: /(^|\s*)\w+(?=:::?|$)/,r: 0 }, ]}});
hljs.initHighlightingOnLoad();
<html lang="en-us">
<head> <meta charset="utf-8"><link href='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/agate.min.css' rel='stylesheet' type='text/css' />
</head><body>
<pre class="r"><code>library(officer)
doc_name <-
officer::read_docx() %>%
flextable:::body_add_flextable(table_to_save) %>%
print(target = "word.docx")
.libPaths()
x = 4
c("a", "b")
package::function()$field
</code></pre>
<script src="https://cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/highlight.min.js"></script>
</body></html>
Pretty close, but far from being perfect. The main hurdle here is that I struggle to fully understand how the parser interprets the patterns. Some of the results simply make no sense to me but still work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With