Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to test if a string has Markdown in it

I'm looking for an easy way to test if a string contains markdown. Currently I'm thinking to convert the string to HTML and then test if there has html with a simple regex but I wonder if there is a more succinct way to do it.

Here's what I've got so far

/<[a-z][\s\S]*>/i.test( markdownToHtml(string) )
like image 835
jwerre Avatar asked Jul 10 '14 21:07

jwerre


1 Answers

I think you have to accept that it's impossible to know with certainty. Markdown borrows its syntax from existing customs—for example underscores for italics was popular on Usenet (though single asterisks meant bold, not italics as well). And of course, people have been using dashes as obvious substitutes for plaintext bullet points, long before Markdown.

Having decided it's subjective though, we may now embark on the task of determining degrees of likelihood that a piece of text contains Markdown. Here are some things I'd consider evidence for Markdown, in order of decreasing strength:

  1. Consecutive lines beginning with 1., e.g. (^|[\n\r])\s*1\.\s.*\s+1\.\s. (See the Markdown behind this answer, for example.) I'd consider this a dead giveaway, because there's even that joke:

    There are only two kinds of people in this world.

    1. Those who understand Markdown.

    1. And those who don't.

  2. Link markdown, e.g. \[[^]]+\]\(https?:\/\/\S+\).

  3. Double underscores or asterisks when a left-right pair (indicated by whether the whitespace is to the left or right, respectively) can be found, e.g. \s(__|\*\*)(?!\s)(.(?!\1))+(?!\s(?=\1)). Let me know if you want me to explain this one.

And so on. Ultimately, you'll have to come up with your own "scoring" system to determine the weight of each of these things. A good way to go about this would be to gather some sample inputs (if you have real ones, then even better), classify them manually as having Markdown or not, and running your regexes and scoring system to see what weights sort them out most accurately.

like image 133
slackwing Avatar answered Oct 27 '22 07:10

slackwing