How might I get a list of elements from a markdown file in python 3? I'm specifically interested in getting a list of all images and links (along with relevant information like alt-text and link text) out of a markdown file.
this Is some prior art in this area, but it is almost exactly 2 years old at this point, and I expect that the landscape has changed a bit.
Bonus points if the parser you come up with supports multimarkdown.
You use the open() function to open the Picnic.md file; passing the value 'r' to the mode parameter to signify that Python should open it for reading. You save the file object in a variable called f , which you can use to reference the file. Then you read the file and save its contents inside the text variable.
The reticulate package includes a Python engine for R Markdown that enables easy interoperability between Python and R chunks.
To add a Python code chunk to an R Markdown document, you can use the chunk header ```{python} , e.g., ```{python} print("Hello Python!") ```
If you take advantage of two Python packages, pypandoc
and panflute
, you could do it quite pythonically in a few lines (sample code):
Given a text file example.md
, and assuming you have Python 3.3+ and already did pip install pypandoc panflute
, then place the sample code in the same folder and run it from the shell or from e.g. IDLE.
import io
import pypandoc
import panflute
def action(elem, doc):
if isinstance(elem, panflute.Image):
doc.images.append(elem)
elif isinstance(elem, panflute.Link):
doc.links.append(elem)
if __name__ == '__main__':
data = pypandoc.convert_file('example.md', 'json')
doc = panflute.load(io.StringIO(data))
doc.images = []
doc.links = []
doc = panflute.run_filter(action, prepare=prepare, doc=doc)
print("\nList of image URLs:")
for image in doc.images:
print(image.url)
The steps are:
pypandoc
to obtain a json string that contains the AST of the markdown documentpanflute
to create a Doc object (panflute requires a stream so we use StringIO)run_filter
function to iterate over every element, and extract the Image and Link objects.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With