Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make master table of contents covering several separate input files using pandoc

Tags:

I am building a document preparation system, which should be capable of rendering fairly large documents.

The input is in pandoc flavoured markdown. To make the documents more manageable, there will be one markdown file per section of the document. For instance, a document might look like this:

File: 01_introduction.md

Introduction
============

This is the introduction

Section 1.1
-----------

This is a section

Section 1.2
-----------

This is another section

File: 02_functionaldescription.md

Functional Description
======================

Section 2.1
-----------

This is a section

Section 2.2
-----------

This is another section

One of the output formats is to be html. I would like to produce one HTML output file per section (corresponding to the input files) and one master table of contents page. The master TOC page should contain links to the headings in the other pages.

I have no problem getting pandoc to produce the individual section html files. I can even get it to correct the section numbering so it all fits as though they were part of one big document. Using a filter I have managed to correct the inter-section links as well.

The problem is the master table of contents. If I feed it all the individual files one one command line, like this:

pandoc -f markdown -t html --number-sections --toc -s *.md

then the TOC that is output looks like this:

<ul>
<li><a href="#introduction"><span class="toc-section-number">1</span> Introduction</a><ul>
<li><a href="#section-1.1"><span class="toc-section-number">1.1</span> Section 1.1</a></li>
<li><a href="#section-1.2"><span class="toc-section-number">1.2</span> Section 1.2</a></li>
</ul></li>
<li><a href="#functional-description"><span class="toc-section-number">2</span> Functional Description</a><ul>
<li><a href="#section-2.1"><span class="toc-section-number">2.1</span> Section 2.1</a></li>
<li><a href="#section-2.2"><span class="toc-section-number">2.2</span> Section 2.2</a></li>
</ul></li>
</ul>

The href's are all fragments that assume the link targets are in the same document. I need them to point to the actual file containing the headings, like this:

<a href="introduction.html#section-1.1">

A have not been able to make a filter work reliably - by the time it reaches the filter, all the files have been concatenated together with nothing to show where each file begins or ends.

The only solution I have come up with so far involves using something other than pandoc to produce the toc, or post-processing the toc. These solutions seem complex so I would like to avoid them if possible.

like image 689
harmic Avatar asked Mar 08 '17 06:03

harmic


People also ask

How do you use extensions in pandoc?

An extensions can be enabled by adding +EXTENSION to the format name and disabled by adding -EXTENSION . For example, markdown_strict+footnotes is strict Markdown with footnotes enabled, while markdown-footnotes-pipe_tables is pandoc's Markdown without footnotes or pipe tables.

What are pandoc files?

Description. Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. Pandoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx.

How do I use pandoc in Python?

Pandoc – 🐍 Python Library It can be used to analyze, create and transform documents, in Python : >>> import pandoc >>> text = "Hello world!" >>> doc = pandoc. read(text) >>> doc Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!

What is pandoc used for?

Pandoc is a command-line tool for converting files from one markup language to another. Markup languages use tags to annotate sections of a document. Commonly used markup languages include Markdown, ReStructuredText, HTML, LaTex, ePub, and Microsoft Word DOCX.


1 Answers

by the time it reaches the filter, all the files have been concatenated together with nothing to show where each file begins or ends.

That's correct, which means there are only two options:

  1. Process each file separately, creating separate TOCs. Then, combine the TOCs while fixing the URLs (quite cumbersome..)

  2. Exploit the fact that in the TOCs you posted, each list item in the top level corresponds to a different file. Then, we can run pandoc once to create a big file, then apply a filter to the big file that leaves us with the correct TOC.

Below I show how to do approach #2 with a filter:

  1. Place this filter in your folder: https://github.com/sergiocorreia/panflute-filters/blob/master/filters/fixtoc.py

  2. Run

    pandoc --number-sections --file-scope --toc -s *.md | pandoc -s -f html -o toc.html -F fixtoc.py -M files:"*.md"

This calls pandoc twice, with the first pass creating the TOC with incorrect links, and the second pass fixing the TOC and deleting everything else (based on the metadata it receives, which contains the filenames).

The output is then stored in toc.html (or whatever name you set), and looks like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<div id="TOC">
<ul>
<li><a href="01_introduction.html#introduction"><span class="toc-section-number">1</span> Introduction</a>
<ul>
<li><a href="01_introduction.html#section-1.1"><span class="toc-section-number">1.1</span> Section 1.1</a></li>
<li><a href="01_introduction.html#remarks"><span class="toc-section-number">1.2</span> Remarks</a></li>
</ul></li>
<li><a href="02_functionaldescription.html#functional-description"><span class="toc-section-number">2</span> Functional Description</a>
<ul>
<li><a href="02_functionaldescription.html#section-2.1"><span class="toc-section-number">2.1</span> Section 2.1</a></li>
<li><a href="02_functionaldescription.html#remarks"><span class="toc-section-number">2.2</span> Remarks</a></li>
</ul></li>
</ul>
</div>
</body>
</html>
like image 84
Sergio Correia Avatar answered Sep 23 '22 09:09

Sergio Correia