Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert rich MarkDown to plain text

How to convert rich Markdown into just plain text? So it can be used i.e. for a Facebook OpenGraph description.

I'm using MarkdownSharp, and it doesn't seem to have this functionality. Before I'm going to reinvent the wheel I thought of asking here first.

Any hints about an implementation strategy are greatly appreciated!

Example

The Monorailcat
---------------
![Picture of a Lolcat](https://media1.giphy.com/media/c7goDcMPKjw6A/200_s.gif)
One of the earliest pictures of **monorail cat** found is from the website [catmas.com’s blog][1] section, dated from November 2, 2006. 
[1]: http://catmas.com/blog

Should be converted to:

The Monorailcat
One of the earliest pictures of monorail cat found is from the website catmas.com’s blog section, dated from November 2, 2006.
like image 623
Dirk Boer Avatar asked Dec 26 '15 20:12

Dirk Boer


2 Answers

You have a few possibilities.

  1. As stated in a comment, you can convert to HTML, then convert the HTML to plain text. This is probably the most reliable and consistent solution cross-platform.

  2. Switch to a library that can convert between multiple formats, including the formats you desire. Pandoc would be an example of such a tool.

  3. Use a Markdown parser which outputs an AST. While such parsers usually provide an HTML renderer (accepts AST as input and outputs HTML), you can create your own renderer which outputs whatever format you want.

Actually, it turns out that Pandoc is also an example of #3. It just happens to already have an existing plain text renderer. Of course, if you are looking for a C# lib, then Pandoc may not meet your needs. And I'm not aware of any C# libs which meet that need (the reference implementation uses regex string substitution and many (most?) parsers have followed that example). That said, I'm not familiar with any of the Markdown libs in C# and this is not an appropriate place to make recommendations. However, there is a lengthy, albeit incomplete, list of parsers here. You may find something of use there.

like image 71
Waylan Avatar answered Sep 19 '22 14:09

Waylan


Some libraries exist that help you to remove markdown syntax, such as removemarkdown or strip-markdown.

like image 42
joepio Avatar answered Sep 19 '22 14:09

joepio