Parsing elements from a markdown file in python 3

Tags:

How might I get a list of elements from a markdown file in python 3? I'm specifically interested in getting a list of all images and links (along with relevant information like alt-text and link text) out of a markdown file.

this Is some prior art in this area, but it is almost exactly 2 years old at this point, and I expect that the landscape has changed a bit.

Bonus points if the parser you come up with supports multimarkdown.

413

asked Dec 03 '16 07:12

Andrew Spott

1 Answers

If you take advantage of two Python packages, pypandoc and panflute, you could do it quite pythonically in a few lines (sample code):

Given a text file example.md, and assuming you have Python 3.3+ and already did pip install pypandoc panflute, then place the sample code in the same folder and run it from the shell or from e.g. IDLE.

Click to copy

import io
import pypandoc
import panflute

def action(elem, doc):
    if isinstance(elem, panflute.Image):
        doc.images.append(elem)
    elif isinstance(elem, panflute.Link):
        doc.links.append(elem)

if __name__ == '__main__':
    data = pypandoc.convert_file('example.md', 'json')
    doc = panflute.load(io.StringIO(data))
    doc.images = []
    doc.links = []
    doc = panflute.run_filter(action, prepare=prepare, doc=doc)

    print("\nList of image URLs:")
    for image in doc.images:
        print(image.url)

The steps are:

Use pypandoc to obtain a json string that contains the AST of the markdown document
Load it into panflute to create a Doc object (panflute requires a stream so we use StringIO)
Use the run_filter function to iterate over every element, and extract the Image and Link objects.
Then you can print the urls, alt text, etc.

answered Sep 20 '22 11:09

Sergio Correia

Related questions
                            
                                PyCharm template for python class __init__ function
                            
                                Is there a way to disable hover bar / mode bar in plotly.py?
                            
                                Elegant way of adding a set to a counter in Python
                            
                                scipy.misc.imshow RuntimeError('Could not execute image view')
                            
                                Which end of a list is the top?
                            
                                Plotting a simple 3d numpy array using matplotlib
                            
                                Is there a way to get the connection string out of sqlalchemy in log suitable format?
                            
                                Tensor Flow - LSTM - 'Tensor' object not iterable
                            
                                Pass nested dictionary location as parameter in Python
                            
                                Setting specific permission in amazon s3 boto bucket
                            
                                Generating colour image gradient using numpy
                            
                                Celery - bulk queue tasks
                            
                                converting scipy.sparse.csr.csr_matrix to a list of lists
                            
                                Python3, Boost-Python and Cpp linker errors
                            
                                Retrieve ID of any element using webdriver in python given value of the element
                            
                                Python. Adding multiple items to keys in a dict
                            
                                How to convert a list of strings to list of dictionaries in python?
                            
                                Tensorflow: tf.get_collection Not Returning Variables in Scope
                            
                                Python decorating class
                            
                                Setting the number of output nodes in scikit-learn's MLPClassifier

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsing elements from a markdown file in python 3

Tags:

python

markdown

multimarkdown

Andrew Spott

People also ask

1 Answers

Sergio Correia

Recent Activity

Donate For Us