Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract code from code directive from restructuredtext using docutils

I would like to extract out the source code verbatim from code directives in a restructuredtext string.

What follows is my first attempt at doing this, but I would like to know if there is a better (i.e. more robust, or more general, or more direct) way of doing it.

Let's say I have the following rst text as a string in python:

s = '''

My title
========

Use this to square a number.

.. code:: python

   def square(x):
       return x**2

and here is some javascript too.

.. code:: javascript

    foo = function() {
        console.log('foo');
    }

'''

To get the two code blocks, I could do

from docutils.core import publish_doctree

doctree = publish_doctree(s)
source_code = [child.astext() for child in doctree.children 
if 'code' in child.attributes['classes']]

Now source_code is a list with just the verbatim source code from the two code blocks. I could also use the attributes attribute of child to find out the code types too, if necessary.

It does the job, but is there a better way?

like image 594
mjandrews Avatar asked Feb 02 '15 00:02

mjandrews


2 Answers

Your solution will only find code blocks at the top level of a document, and it may return false positives if the class "code" is used on other elements (unlikely, but possible). I would also check the element/node's type, specified in its .tagname attribute.

There's a "traverse" method on nodes (and a document/doctree is just a special node) that does a full traversal of the document tree. It will look at all elements in a document and return only those that match a user-specified condition (a function that returns a boolean). Here's how:

def is_code_block(node):
    return (node.tagname == 'literal_block'
            and 'code' in node.attributes['classes'])

code_blocks = doctree.traverse(condition=is_code_block)
source_code = [block.astext() for block in code_blocks]
like image 172
David Goodger Avatar answered Nov 15 '22 04:11

David Goodger


This can be further simplified like::

source_code = [block.astext() for block in doctree.traverse(nodes.literal_block)
               if 'code' in block.attributes['classes']]
like image 28
G. Milde Avatar answered Nov 15 '22 03:11

G. Milde