Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate table of contents from Markdown in php

I would like to create a table of contents out of Markdown.
For example in stackedit.io https://stackedit.io/editor#table-of-contents when you insert:

[TOC]

Is there any way to generate this out of markdown?

E.g. if you have:

## header 1
## header 2

ToC should be:

<ol>
   <li><a href="#header1">Header 1</a></li>
   <li><a href="#header2">Header 2</a></li>
</ol>

Should I create my own markdown parser just to get the ToC?

like image 690
giò Avatar asked Aug 18 '15 09:08

giò


1 Answers

Following is a function which does the essential job: it returns a JSON list of found titles, each with its level and text.
This JSON element can furtherly be used to generate the needed HTML structure, or anything else.

Schematically it works like this:

  1. Get the markdown file as a string, and normalize line-breaks to only \n (this is important for step #3 below)
  2. Apply a simple regexp /^(?:=|-|#).*$/m with PREG_OFFSET_CAPTURE: so matches all lines which either:
    • are the "underliners" of <h1> (when "=") or <h2> (when "-") titles
    • are titles by themselves (beginning with "#")
  3. Iterate the matched lines:
    • for "underliners", look at the source file for the previous line, located as the string between the current line offset and the previous line-break; then get level from the underliner type and the text from the previous line
    • otherwise simply get level and text from the current line itself

Here is the function:

function markdown_toc($file_path) {
  $file = file_get_contents($file_path);

  // ensure using only "\n" as line-break
  $source = str_replace(["\r\n", "\r"], "\n", $file);

  // look for markdown TOC items
  preg_match_all(
    '/^(?:=|-|#).*$/m',
    $source,
    $matches,
    PREG_PATTERN_ORDER | PREG_OFFSET_CAPTURE
  );

  // preprocess: iterate matched lines to create an array of items
  // where each item is an array(level, text)
  $file_size = strlen($source);
  foreach ($matches[0] as $item) {
    $found_mark = substr($item[0], 0, 1);
    if ($found_mark == '#') {
      // text is the found item
      $item_text = $item[0];
      $item_level = strrpos($item_text, '#') + 1;
      $item_text = substr($item_text, $item_level);
    } else {
      // text is the previous line (empty if <hr>)
      $item_offset = $item[1];
      $prev_line_offset = strrpos($source, "\n", -($file_size - $item_offset + 2));
      $item_text =
        substr($source, $prev_line_offset, $item_offset - $prev_line_offset - 1);
      $item_text = trim($item_text);
      $item_level = $found_mark == '=' ? 1 : 2;
    }
    if (!trim($item_text) OR strpos($item_text, '|') !== FALSE) {
      // item is an horizontal separator or a table header, don't mind
      continue;
    }
    $raw_toc[] = ['level' => $item_level, 'text' => trim($item_text)];
  }

  // create a JSON list (the easiest way to generate HTML structure is using JS)
  return json_encode($raw_toc);
}

Here is the result it returns from the home page of the link you provided:

[
  {"level":1,"text":"Welcome to StackEdit!"},
  {"level":2,"text":"Documents"},
  {"level":4,"text":"<\/i> Create a document"},
  {"level":4,"text":"<\/i> Switch to another document"},
  {"level":4,"text":"<\/i> Rename a document"},
  {"level":4,"text":"<\/i> Delete a document"},
  {"level":4,"text":"<\/i> Export a document"},
  {"level":2,"text":"Synchronization"},
  {"level":4,"text":"<\/i> Open a document"},
  {"level":4,"text":"<\/i> Save a document"},
  {"level":4,"text":"<\/i> Synchronize a document"},
  {"level":4,"text":"<\/i> Manage document synchronization"},
  {"level":2,"text":"Publication"},
  {"level":4,"text":"<\/i> Publish a document"},
  {"level":2,"text":"- Markdown, to publish the Markdown text on a website that can interpret it (**GitHub** for instance),"},
  {"level":2,"text":"- HTML, to publish the document converted into HTML (on a blog for example),"},
  {"level":4,"text":"<\/i> Update a publication"},
  {"level":4,"text":"<\/i> Manage document publication"},
  {"level":2,"text":"Markdown Extra"},
  {"level":3,"text":"Tables"},
  {"level":3,"text":"Definition Lists"},
  {"level":3,"text":"Fenced code blocks"},
  {"level":3,"text":"Footnotes"},
  {"level":3,"text":"SmartyPants"},
  {"level":3,"text":"Table of contents"},
  {"level":3,"text":"MathJax"},
  {"level":3,"text":"UML diagrams"},
  {"level":3,"text":"Support StackEdit"}
]
like image 50
cFreed Avatar answered Oct 19 '22 11:10

cFreed