I have a very small subset of Markdown along with some custom html that I would like to parse into React components. For example, I would like to turn this following string:
hello *asdf* *how* _are_ you !doing! today
Into the following array:
[ "hello ", <strong>asdf</strong>, " ", <strong>how</strong>, " ", <em>are</em>, " you ", <MyComponent onClick={this.action}>doing</MyComponent>, " today" ]
and then return it from a React render function (React will render the array properly as formatted HTML)
Basically, I want to give users the option to use a very limited set of Markdown to turn their text into styled components (and in some cases my own components!)
It is unwise to dangerouslySetInnerHTML, and I do not want to bring in an external dependency, because they are all very heavy, and I only need very basic functionality.
I'm currently doing something like this, but it is very brittle, and doesn't work for all cases. I was wondering if there were a better way:
function matchStrong(result, i) {
let match = result[i].match(/(^|[^\\])\*(.*)\*/);
if (match) { result[i] = <strong key={"ms" + i}>{match[2]}</strong>; }
return match;
}
function matchItalics(result, i) {
let match = result[i].match(/(^|[^\\])_(.*)_/); // Ignores \_asdf_ but not _asdf_
if (match) { result[i] = <em key={"mi" + i}>{match[2]}</em>; }
return match;
}
function matchCode(result, i) {
let match = result[i].match(/(^|[^\\])```\n?([\s\S]+)\n?```/);
if (match) { result[i] = <code key={"mc" + i}>{match[2]}</code>; }
return match;
}
// Very brittle and inefficient
export function convertMarkdownToComponents(message) {
let result = message.match(/(\\?([!*_`+-]{1,3})([\s\S]+?)\2)|\s|([^\\!*_`+-]+)/g);
if (result == null) { return message; }
for (let i = 0; i < result.length; i++) {
if (matchCode(result, i)) { continue; }
if (matchStrong(result, i)) { continue; }
if (matchItalics(result, i)) { continue; }
}
return result;
}
Here is my previous question which led to this one.
It works by reading a string chunk by chunk, which might not be the best solution for really long strings.
Whenever the parser detects a critical chunk is being read, i.e. '*'
or
any other markdown tag, it starts parsing chunks of this element until the
parser finds its closing tag.
It works on multi-line strings, see the code for example.
You haven't specified, or I could have misuderstood your needs, if there's the necessity to parse tags that are both bold and italic, my current solution might not work in this case.
If you need, however, to work with the above conditions just comment here and I'll tweak the code.
Tags are no longer hardcoded, instead they are a map where you can easily extend to fit your needs.
Fixed the bugs you've mentioned in the comments, thanks for pointing this issues =p
Though the method parseMarkdown
does not yet support multi-length tags,
we can easily replace those multi-length tags with a simple string.replace
when sending our rawMarkdown
prop.
To see an example of this in practice, look at the ReactDOM.render
, located
at the end of the code.
Even if your application does support multiple languages, there are invalid
unicode characters that JavaScript still detects, ex.: "\uFFFF"
is not a valid
unicode, if I recall correctly, but JS will still be able to compare it ("\uFFFF" === "\uFFFF" = true
)
It might seems hack-y at first but, depending on your use-case, I don't see any major issues by using this route.
Well, we could easily track the last N
(where N
corresponds to the length
of the longest multi-length tag) chunks.
There would be some tweaks to be made to the way the loop inside method
parseMarkdown
behaves, i.e. checking if current chunk is part of a multi-length
tag, if it is use it as a tag; otherwise, in cases like ``k
, we'd need
to mark it as notMultiLength
or something similar and push that chunk as
content.
// Instead of creating hardcoded variables, we can make the code more extendable
// by storing all the possible tags we'll work with in a Map. Thus, creating
// more tags will not require additional logic in our code.
const tags = new Map(Object.entries({
"*": "strong", // bold
"!": "button", // action
"_": "em", // emphasis
"\uFFFF": "pre", // Just use a very unlikely to happen unicode character,
// We'll replace our multi-length symbols with that one.
}));
// Might be useful if we need to discover the symbol of a tag
const tagSymbols = new Map();
tags.forEach((v, k) => { tagSymbols.set(v, k ); })
const rawMarkdown = `
This must be *bold*,
This also must be *bo_ld*,
this _entire block must be
emphasized even if it's comprised of multiple lines_,
This is an !action! it should be a button,
\`\`\`
beep, boop, this is code
\`\`\`
This is an asterisk\\*
`;
class App extends React.Component {
parseMarkdown(source) {
let currentTag = "";
let currentContent = "";
const parsedMarkdown = [];
// We create this variable to track possible escape characters, eg. "\"
let before = "";
const pushContent = (
content,
tagValue,
props,
) => {
let children = undefined;
// There's the need to parse for empty lines
if (content.indexOf("\n\n") >= 0) {
let before = "";
const contentJSX = [];
let chunk = "";
for (let i = 0; i < content.length; i++) {
if (i !== 0) before = content[i - 1];
chunk += content[i];
if (before === "\n" && content[i] === "\n") {
contentJSX.push(chunk);
contentJSX.push(<br />);
chunk = "";
}
if (chunk !== "" && i === content.length - 1) {
contentJSX.push(chunk);
}
}
children = contentJSX;
} else {
children = [content];
}
parsedMarkdown.push(React.createElement(tagValue, props, children))
};
for (let i = 0; i < source.length; i++) {
const chunk = source[i];
if (i !== 0) {
before = source[i - 1];
}
// Does our current chunk needs to be treated as a escaped char?
const escaped = before === "\\";
// Detect if we need to start/finish parsing our tags
// We are not parsing anything, however, that could change at current
// chunk
if (currentTag === "" && escaped === false) {
// If our tags array has the chunk, this means a markdown tag has
// just been found. We'll change our current state to reflect this.
if (tags.has(chunk)) {
currentTag = tags.get(chunk);
// We have simple content to push
if (currentContent !== "") {
pushContent(currentContent, "span");
}
currentContent = "";
}
} else if (currentTag !== "" && escaped === false) {
// We'll look if we can finish parsing our tag
if (tags.has(chunk)) {
const symbolValue = tags.get(chunk);
// Just because the current chunk is a symbol it doesn't mean we
// can already finish our currentTag.
//
// We'll need to see if the symbol's value corresponds to the
// value of our currentTag. In case it does, we'll finish parsing it.
if (symbolValue === currentTag) {
pushContent(
currentContent,
currentTag,
undefined, // you could pass props here
);
currentTag = "";
currentContent = "";
}
}
}
// Increment our currentContent
//
// Ideally, we don't want our rendered markdown to contain any '\'
// or undesired '*' or '_' or '!'.
//
// Users can still escape '*', '_', '!' by prefixing them with '\'
if (tags.has(chunk) === false || escaped) {
if (chunk !== "\\" || escaped) {
currentContent += chunk;
}
}
// In case an erroneous, i.e. unfinished tag, is present and the we've
// reached the end of our source (rawMarkdown), we want to make sure
// all our currentContent is pushed as a simple string
if (currentContent !== "" && i === source.length - 1) {
pushContent(
currentContent,
"span",
undefined,
);
}
}
return parsedMarkdown;
}
render() {
return (
<div className="App">
<div>{this.parseMarkdown(this.props.rawMarkdown)}</div>
</div>
);
}
}
ReactDOM.render(<App rawMarkdown={rawMarkdown.replace(/```/g, "\uFFFF")} />, document.getElementById('app'));
Link to the code (TypeScript) https://codepen.io/ludanin/pen/GRgNWPv
Link to the code (vanilla/babel) https://codepen.io/ludanin/pen/eYmBvXw
It looks like you are looking for a small very basic solution. Not "super-monsters" like react-markdown-it
:)
I would like to recommend you https://github.com/developit/snarkdown which looks pretty lightweight and nice! Just 1kb and extremely simple, you can use it & extend it if you need any other syntax features.
Supported tags list https://github.com/developit/snarkdown/blob/master/src/index.js#L1
Just noticed about react components, missed it in the beginning. So that's great for you I believe to take the library as an example and implement your custom required components to get it done without setting HTML dangerously. The library is pretty small and clear. Have fun with it! :)
var table = {
"*":{
"begin":"<strong>",
"end":"</strong>"
},
"_":{
"begin":"<em>",
"end":"</em>"
},
"!":{
"begin":"<MyComponent onClick={this.action}>",
"end":"</MyComponent>"
},
};
var myMarkdown = "hello *asdf* *how* _are_ you !doing! today";
var tagFinder = /(?<item>(?<tag_begin>[*|!|_])(?<content>\w+)(?<tag_end>\k<tag_begin>))/gm;
//Use case 1: direct string replacement
var replaced = myMarkdown.replace(tagFinder, replacer);
function replacer(match, whole, tag_begin, content, tag_end, offset, string) {
return table[tag_begin]["begin"] + content + table[tag_begin]["end"];
}
alert(replaced);
//Use case 2: React components
var pieces = [];
var lastMatchedPosition = 0;
myMarkdown.replace(tagFinder, breaker);
function breaker(match, whole, tag_begin, content, tag_end, offset, string) {
var piece;
if (lastMatchedPosition < offset)
{
piece = string.substring(lastMatchedPosition, offset);
pieces.push("\"" + piece + "\"");
}
piece = table[tag_begin]["begin"] + content + table[tag_begin]["end"];
pieces.push(piece);
lastMatchedPosition = offset + match.length;
}
alert(pieces);
The result:
Regexp test result
Explanation:
/(?<item>(?<tag_begin>[*|!|_])(?<content>\w+)(?<tag_end>\k<tag_begin>))/
You can define your tags in this section: [*|!|_]
, once one of them is matched, it will be captured as a group and named as "tag_begin".
And then (?<content>\w+)
captures the content wrapped by the tag.
The ending tag must be as same as the previously matched one, so here uses \k<tag_begin>
, and if it passed the test then capture it as a group and give it a name "tag_end", that's what (?<tag_end>\k<tag_begin>))
is saying.
In the JS you've set up a table like this:
var table = {
"*":{
"begin":"<strong>",
"end":"</strong>"
},
"_":{
"begin":"<em>",
"end":"</em>"
},
"!":{
"begin":"<MyComponent onClick={this.action}>",
"end":"</MyComponent>"
},
};
Use this table to replace the matched tags.
Sting.replace has an overload String.replace(regexp, function) which can take captured groups as it's parameters, we use these captured items for looking up the table and generate the replacing string.
[Update]
I have updated the code, I kept the first one in case someone else doesn't need react components, and you can see there is little difference between them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With