Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Pandoc capable of injecting arbitrary HTML attributes to any elements?

So code blocks can define HTML attributes using the fenced_code_blocks extension:

~~~~ {#mycode .haskell .numberLines startFrom="100"}
qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++
               qsort (filter (>= x) xs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Is it possible to use the above syntax, in some way, for regular text blocks? For example, I'd like to convert the following Markdown text:

# My header

~~~ {.text}
This is regular text. This is regular text.
~~~

~~~ {.quote}
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
~~~

~~~ {data-id=test-123}
+   Red
+   Green
+   Blue
~~~

into something like this:

<h1 id="my-header">My header</h1>
<p class="text">This is regular text. This is regular text.</p>
<blockquote class="quote">
<p>This is the first level of quoting.</p>
<blockquote>
<p>This is nested blockquote.</p>
</blockquote>
<p>Back to the first level.</p>
</blockquote>
<ul data-id="test-123">
<li>Red</li>
<li>Green</li>
<li>Blue</li>
</ul>

If there is no such support in Pandoc itself, would it be possible to create a custom writer in Lua that does so?

Edit: Looking at the sample.lua custom writer, anyone know what the "attributes table" is on line 35? And how does one pass these attributes to specific Pandoc elements? Also, the functionality I'm looking for above is very similar to the header_extension extension except it would work for all elements, not just headers.

like image 200
mart1n Avatar asked Nov 25 '13 18:11

mart1n


2 Answers

This is very doable in kramdown, which will convert the following input

# My header

This is regular text. This is regular text.
{: .text}

> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
{: .quote}

+   Red
+   Green
+   Blue
{: data-id="test-123"}

to

<h1 id="my-header">My header</h1>

<p class="text">This is regular text. This is regular text.</p>

<blockquote class="quote">
  <p>This is the first level of quoting.</p>

  <blockquote>
    <p>This is nested blockquote.</p>
  </blockquote>

  <p>Back to the first level.</p>
</blockquote>

<ul data-id="test-123">
  <li>Red</li>
  <li>Green</li>
  <li>Blue</li>
</ul>

See the attribute list definition section of the syntax for details.

like image 192
Kyle Barbour Avatar answered Oct 04 '22 17:10

Kyle Barbour


Pandoc's filters let you operate on Pandoc's internal representation of the document. It's possible to have a chain of filters that do different transformations. I'll share two illustrative examples of filters that should help.

Markdown Code Blocks

Code blocks in Pandoc are usually meant to embed source code listings from programming languages, but here we're trying to extract the body and interpret it as markdown. Rather than using classes from your input document like text and quote, let's use a generic as-markdown class. Pandoc will generate the appropriate tags automatically.

# My header

~~~ {.as-markdown}
This is regular text. This is regular text.
~~~

~~~ {.as-markdown}
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
~~~

~~~ {.as-markdown data-id=test-123}
+   Red
+   Green
+   Blue
~~~

~~~ haskell
main :: IO ()
~~~

To ensure code blocks without the as-markdown class are interpreted as usual, I included a haskell code block. Here's the filter implementation:

#!/usr/bin/env runhaskell
import Text.Pandoc.Definition       (Pandoc(..), Block(..), Format(..))
import Text.Pandoc.Error            (handleError)
import Text.Pandoc.JSON             (toJSONFilter)
import Text.Pandoc.Options          (def)
import Text.Pandoc.Readers.Markdown (readMarkdown)

asMarkdown :: String -> [Block]
asMarkdown contents =
  case handleError $ readMarkdown def contents of
    Pandoc _ blocks -> blocks

-- | Unwrap each CodeBlock with the "as-markdown" class, interpreting
-- its contents as Markdown.
markdownCodeBlock :: Maybe Format -> Block -> IO [Block]
markdownCodeBlock _ cb@(CodeBlock (_id, classes, _namevals) contents) =
  if "as-markdown" `elem` classes then
    return $ asMarkdown contents
  else
    return [cb]
markdownCodeBlock _ x = return [x]

main :: IO ()
main = toJSONFilter markdownCodeBlock

Running pandoc --filter markdown-code-block.hs index.md produces:

<h1 id="my-header">My header</h1>
<p>This is regular text. This is regular text.</p>
<blockquote>
<p>This is the first level of quoting.</p>
<blockquote>
<p>This is nested blockquote.</p>
</blockquote>
<p>Back to the first level.</p>
</blockquote>
<ul>
<li>Red</li>
<li>Green</li>
<li>Blue</li>
</ul>
<div class="sourceCode"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span class="ot">main ::</span> <span class="dt">IO</span> ()</code></pre></div>

Almost there! The only part that's not quite right is the HTML attributes.

Custom HTML Attributes from Code Block Metadata

The following filter should help you get started. It converts code blocks with the web-script class to an HTML <script> tag when the target format is html or html5.

#!/usr/bin/env runhaskell
import Text.Pandoc.Builder
import Text.Pandoc.JSON

webFormats :: [String]
webFormats =
  [ "html"
  , "html5"
  ]

script :: String -> Block
script src = Para $ toList $ rawInline "html" ("<script type='application/javascript'>" <> src <> "</script>")

injectScript :: Maybe Format -> Block -> IO Block
injectScript (Just (Format format)) cb@(CodeBlock (_id, classes, _namevals) contents) =
  if "web-script" `elem` classes then
    if format `elem` webFormats then
      return $ script contents
    else
      return Null
  else
    return cb
injectScript _ x = return x

main :: IO ()
main = toJSONFilter injectScript

The data-id=test-123 in your last block would come through in the _namevals's key-value pairs with type [(String, String)]. All you'd need to do is refactor script to support arbitrary tags and key-value pairs for HTML attributes, and specify what HTML to generate based on those inputs. To see the native representation of the input document, run pandoc -t native index.md.

[Header 1 ("my-header",[],[]) [Str "My",Space,Str "header"]
,CodeBlock ("",["as-markdown"],[]) "This is regular text. This is regular text."
,CodeBlock ("",["as-markdown"],[]) "> This is the first level of quoting.\n>\n> > This is nested blockquote.\n>\n> Back to the first level."
,CodeBlock ("",["as-markdown"],[("data-id","test-123")]) "+   Red\n+   Green\n+   Blue"
,Para [Str "To",Space,Str "ensure",Space,Str "regular",Space,Str "code",Space,Str "blocks",Space,Str "work",Space,Str "as",Space,Str "usual."]
,CodeBlock ("",["haskell"],[]) "main :: IO ()"]

If you'd like to play around with either of these examples, they're both in my pandoc-experiments repository.

like image 20
Sage Mitchell Avatar answered Oct 04 '22 18:10

Sage Mitchell