Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PageDown through ScriptEngine incorrectly parsing Markdown

I am trying to use PageDown on the client side as an editor, and on the server side to then parse that Markdown to HTML.

It seems to work fine on the client side, but on the server side, tickmarks are only "codifying" the character that follows, not the word that it wraps. So if I do this:

test `test` test

I expect this, and this is indeed what I get on the client side:

test <code>test</code> test

But on the server side, I end up getting this instead:

test <code>t</code>est<code> </code>test

I've created a file called pageDown.js, which is simply Markdown.Converter.js and Markdown.Sanitizer.js combined into a single file, with this function added:

function getSanitizedHtml(pagedown){
    var converter =  new Markdown.getSanitizingConverter();
    return converter.makeHtml(pagedown);
}

On the client side, I can use this file like so:

<!DOCTYPE html>
<html>
<head>
<script src="pageDown.js"></script>
<script>
function convert(){

    var html = getSanitizedHtml("test `test` test");

    console.log(html);

    document.getElementById("content").innerHTML = html;
}

</script>
</head>

<body onload="convert()">
<p id="content"></p>
</body>
</html>

That correctly displays: <p>test <code>test</code> test</p>

On the (Java) server side, I use this same exact file, through Java's ScriptEngineManager and Invocable:

import java.io.InputStreamReader;
import javax.script.Invocable;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;

public class PageDownTest{

    public static void main(String... args){

        try{
            ScriptEngineManager manager = new ScriptEngineManager();
            ScriptEngine engine = manager.getEngineByName("JavaScript");
            engine.eval(new InputStreamReader(PageDownTest.class.getResourceAsStream("pageDown.js")));
            Invocable inv = (Invocable) engine;
            String s = String.valueOf(inv.invokeFunction("getSanitizedHtml", "test `test` test"));
            System.out.println(s);
        }
        catch(Exception e){
            e.printStackTrace();
        }
    }
}

That program prints out this: <p>test <code>t</code>est<code></code>test</p>

I see similar problems with other markdown: test **test** test simply ignores the ** part. However, ##test correctly returns as <h2>test</h2>.

This all works fine if I go to the JavaScript directly through HTML, but not when I go through Java. What's going on here? Should I be handling Markdown on the server differently?

like image 941
Kevin Workman Avatar asked Sep 27 '22 19:09

Kevin Workman


1 Answers

I managed to reduce the problem to the following code:

function getSanitizedHtml(text)
{
    return text.replace(/(a)(?!b)\1/gm, 'c');
}

When called in the browser as

getSanitizedHtml('aa');

it returns:

c

When called from the Nashorn engine as

String s = String.valueOf(inv.invokeFunction("getSanitizedHtml", "aa"));

it returns:

cc

To me, this looks like the backreference \1, which should point to (a), instead points to (?!b), whose captured content is zero-length and thus matches anything.

The equivalent code in Java:

System.out.println(("aa").replaceAll("(a)(?!b)\\1", "c"));

returns the correct result though:

c

Conclusion

I'm pretty sure this is a bug in the Nashorn engine.
I filed a bug report and will post its ID here, if it goes public.

As for your problem, I think your only option is to switch to a different JavaScript environment, at least temporarily.

Minimal, runnable examples

JS in browser:

function x(s){return s.replace(/(a)(?!b)\1/gm, 'c');}
document.write(x('aa'));

JS in Nashorn engine:

[ Ideone ]

Pure Java:

[ Ideone ]

Possible fix

As already pointed out, your only option (at this point) is to switch to another JavaScript environment.
There are many of those available, and Wikipedia has a comparison page. For this example, I've chosen io.js (I trust you'll manage to install it on your own).

If you want to use your pageDown.js file, you'll first need to comment out the exports checks and use the plain old variables, like this:

/*if (typeof exports === "object" && typeof require === "function") // we're in a CommonJS (e.g. Node.js) module
    Markdown = exports;
else*/
    Markdown = {};

and

/*if (typeof exports === "object" && typeof require === "function") { // we're in a CommonJS (e.g. Node.js) module
    output = exports;
    Converter = require("./Markdown.Converter").Converter;
} else {*/
    output = Markdown;
    Converter = output.Converter;
//}

(Note that I also changed output = window.Markdown; to output = Markdown; - you must have done the same (Nashorn would have given you an error otherwise), but just forgot to mention that in your question.)

Alternatively, you could of course use the exports system and separate files, but I have no experience with that, so I'll do it this way.

Now, io.js accepts JavaScript code from stdin, and you can write to stdout via process.stdout.write(), so we can do the following (on the command line):

{ cat pageDown.js; echo 'process.stdout.write(getSanitizedHtml("test `test` test"));'; } | iojs;

And we get the following back:

<p>test <code>test</code> test</p>

If you need to do that from Java, you can do it like this:

import java.io.*;

class Test
{
    public static void main(String[] args) throws Exception
    {
        Process p = Runtime.getRuntime().exec("/path/to/iojs");
        OutputStream stdin = p.getOutputStream();
        InputStream stdout = p.getInputStream();
        File file = new File("/path/to/pageDown.js");
        byte[] b = new byte[(int)file.length()];
        FileInputStream in = new FileInputStream(file);
        for(int read = 0; read < b.length; read += in.read(b, read, b.length - read)); // <-- note the semicolon
        stdin.write(b);
        stdin.write("process.stdout.write(getSanitizedHtml('test `test` test'));".getBytes());
        stdin.close(); // <-- important to close
        p.waitFor();
        b = new byte[stdout.available()];
        stdout.read(b);
        System.out.println(new String(b));
    }
}

Note the semicolon directly after the for (so it only does read += in.read(b, read, b.length - read) every time, and nothing else) and also note that while calling .close() on a stream is usually optional, as it will be done automatically when the object goes out of scope, stdin.close() has to be called here, or iojs will continue to wait for input, and p.waitFor() will never return.

like image 179
Siguza Avatar answered Oct 03 '22 05:10

Siguza