I'm writing a translator from a Markdown-like markup to HTML. I have completed the script, except for ordered/unordered list translation. I want to format lists based on significant whitespace (aka off-side rule). Example valid input is like this:
:: List item
top level
:: List item level 2
:: List item level 2
:: List item level 3
:: List item level 4
:: List item level 2
:: List item top level
:: denotes a list item. Indentation levels might be arbitary. Tabs are not significant. I have been working on solutions on paper, but I couldn't figure out a way to implement. How should I go about this?
P.S: As long as it is more than one, any arbitary amout of spaces denotes a new level, like in python.
I'm using python to implement this, but I'm not looking for code. I want explanation of how to do. And preferably I want to implement the complete thing myself, without any libraries. I'm going to use this markup for my jekyll blog, but this is more than a little tool for me, I want to learn as much as I can about regular expressions and parsing from this project. Thanks in advance.
@delnan's link to the Python reference provides a good approach, but (as the reference itself suggests) Python allows correct indentation that is also confusing to read and (if you try to take advantage of its full liberality) potentially tricky to debug.
For your application, it might be less confusing for the user if you required each unique number of indenting spaces to indicate a different list level. For those semantics, you can find the levels for the list in no more than four lines of Python 3. You didn't want to see a solution in code (though I'd be happy to post it if you'd like) so my approach was roughly as follows:
(EDITED to include the code and to handle multi-line list items)
Given:
:: List item
(this is the second line of the first list item)
:: List item level 2
:: List item level 2
:: List item level 3
:: List item level 4
:: List item level 2
:: List item top leve
... the function below produces the list:
:: List item (this is the second line of the first list item)
:: List item level 2
:: List item level 2
:: List item level 3
:: List item level 4
:: List item level 2
:: List item top level
... which I think was the intended result for this test case.
Here's the code, written to accept the list from standard input:
import sys
def findIndent (lst):
# given a list of text strings, returns a list containing the
# indentation levels for each string
spcCount = [len(s)-len(s.lstrip(' ')) for s in lst]
indent = sorted(set(spcCount))
levelRef = {indent[i]:i for i in range(len(indent))}
return [levelRef[i]+1 for i in spcCount]
lst = []
for li in sys.stdin:
if li.lstrip(' ').find('::') == 0:
lst.append(li.rstrip())
else:
lst[-1] = lst[-1].rstrip() + ' ' + li.lstrip(' ').rstrip()
for i,li in zip(findIndent(lst),lst):
print (' '*i + li.lstrip())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With