Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is possible to use yaml metadata blocks to extend pandoc syntax?

First a little bit of context:

I'm writing an academical article on pandoc/yaml + Leo Editor. With this combination I can write in a really organic way. Leo Editor tree is used to organize the writing in a non-linear fashion, so I can see the main topics of the writing with nested deep on them, choose what to focus on in the next writing session and put some parts of the writing on holding and so on. Yaml nodes in the tree store the bibligraphical references and a custom made script node is used to convert that Leo tree to pandoc's markdown and that file is used to create the pdf.

Today I wrote something like this:

See the image [#hs-world-map]

--- 
type: image

file: ../Imagenes/hackerspaces-mapa-2014-ene.png

scale: 50

alias: hs-world-map

caption: |

    Mapa mundial de los hackerspaces a enero 4 de 2014 registrados en
    http://hackerspaces.org. Las concentraciones de hackerspaces están denotadas
    por dos indicadores: el número y el color. Los colores rojos y números
    grandes indican mayor concentración de hackerspaces, seguidos por los 
    naranja y números medianos y terminando en los azules, con números pequeños.
    Se puede ver cómo este es un fenómeno global con mayor preminencia
    anglo-europea (la costa este de Estados Unidos tiene 110 hackerspaces y
    Europa 175) y menor notoriedad en Sur América, India, China y Africa.
    Algunos de los contrastes respecto a la cultura hackers y como se
    contextualiza en el Norte Global y en el Sur Global que se han mencionado en
    este escrito, se hacen evidentes en este mapa.


... 

This is a yaml block inside a pandoc's markdown document (the leading "---" is not showed properly), defining some properties and syntax I would like to have for images in pandoc like scale, alias and a better way to support long captions. External to the yaml block I have put some reference to the alias figure using and invented shorthand ("[#hs-world-map]"), similar to the one of [@cite] for bibliographic references.

I have seen from lua example and pandoc scripting guide that is possible to write custom writers that modify the pandoc output, but I don't know how to extract data from the yaml blocks and if using my own shorthand for cross-referencing figures ([#alias]) will work. So my question is:

  • There is any example of how to extract yaml blocks data in markdown's pandoc and using it to insert that data in a modified output (preferably LaTeX and HTML)? I wouldn't matter to learn lua if it's necessary, but would be better if that example is on python, just to focus on writing the article.

(I think that this custom syntax could be a way to evolve pandoc sharing yaml blocks and custom writers, at least is a good experiment about how this can be done).

like image 696
Offray Avatar asked Jan 05 '14 01:01

Offray


2 Answers

What I've found is that it's not possible to do something like you want.

The documentation says that there can be more than one YAML block in the document, but they will be merged into a single one, keeping always the first appearance of each attribute.

Let's consider this example document, which I'll call test.md:

---
a: Hola
b: mundo
...

---
a: Lorem
c: ipsum
...

If I convert it to Pandoc's native representation, you'll notice that the second use of a is lost, and that there is no way to tell both blocks apart:

$ pandoc test.md -t native -s
Pandoc (Meta {unMeta = fromList [
    ("a",MetaInlines [Str "Hola"]),
    ("b",MetaInlines [Str "mundo"]),
    ("c",MetaInlines [Str "ipsum"])
]})

So, while there can be multiple YAML blocks, they are considered parts of a single metadata object.

like image 164
Roberto Bonvallet Avatar answered Sep 24 '22 20:09

Roberto Bonvallet


Yes,

  1. there is an easy way to extract yaml data with pandoc and use it to generate LaTeX: edit the template. There is an important limitation though.

  2. an example is in the LaTeX template.

To get the full LaTeX template, use

pandoc -D latex

Te relevant part is the code to extract the authors from the metadata.

$if(author)$
\author{$for(author)$$author$$sep$ \and $endfor$}
$endif$

It will extract multiple authors from this part of yaml metadata:

---
author:
    - Mr. Smart
    - Mr. Brilliant
...

You could extend the template to

$if(author)$
  \author{
    $for(author)$
      $author.name$ \\
      $author.email$
      $sep$ \par
    $endfor$
  }
$endif$

And use this yaml as input

---
author:
    - name: Mr. Smart
      email: [email protected]
    - name: Mr. Brilliant
      email: [email protected]
...

So there is an important limitation: all yaml blocks of the same kind should follow without (yaml) interruption. Each block should start with "-".

I am 'abusing' the yaml metadata in this way to write the full content of evaluation documents in a very minimalistic yaml syntax that is easy to write now and will simplify automatic processing in the future. I use pandoc as an easy to use yaml to LaTex (pdf) convertor.

It might be worth filing a feature request to improve the yaml reading of pandoc to also accept multiple fields with the same name (e.g. author) and allow looping through them.

like image 22
Bert Avatar answered Sep 21 '22 20:09

Bert