Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert tabbed text to html unordered list?

Tags:

python

html

I'm a beginner programmer so this question might sound trivial: I have some text files containg tab-delimited text like:

A
    B
    C
        D
        E

Now I want to generate unordered .html lists out of this, with the structure:

<ul>
<li>A
<ul><li>B</li>
<li>C
<ul><li>D</li>
<li>E</li></ul></li></ul></li>
</ul>

My idea was to write a Python script, but if there is an easier (automatic) way, that is fine too. For identifying the indentation level and item name I would try to use this code:

import sys
indent = 0
last = []
for line in sys.stdin:
    count = 0
    while line.startswith("\t"):
       count += 1
       line = line[1:]
    if count > indent:
       indent += 1
       last.append(last[-1])
    elif count < indent:
       indent -= 1
       last = last[:-1]
like image 356
Elip Avatar asked Aug 28 '12 16:08

Elip


People also ask

What are the 3 types of unordered list?

Unordered list or Bulleted list (ul) Ordered list or Numbered list (ol) Description list or Definition list (dl)

What is Unorder list in HTML?

An unordered list typically is a bulleted list of items. HTML 3.0 gives you the ability to customise the bullets, to do without bullets and to wrap list items horizontally or vertically for multicolumn lists. The opening list tag must be <UL>.


2 Answers

Try this (works on your test case):

import itertools
def listify(filepath):
    depth = 0
    print "<ul>"*(depth+1)
    for line in open(filepath):
        line = line.rstrip()
        newDepth = sum(1 for i in itertools.takewhile(lambda c: c=='\t', line))
        if newDepth > depth:
            print "<ul>"*(newDepth-depth)
        elif depth > newDepth:
            print "</ul>"*(depth-newDepth)
        print "<li>%s</li>" %(line.strip())
        depth = newDepth
    print "</ul>"*(depth+1)

Hope this helps

like image 148
inspectorG4dget Avatar answered Sep 28 '22 00:09

inspectorG4dget


tokenize module understands your input format: lines contain a valid Python identifiers, the indentation level of the statements is significant. ElementTree module allows you to manipulate tree structures in memory so it might be more flexable to separate a tree creation from a rendering it as html:

from tokenize import NAME, INDENT, DEDENT, ENDMARKER, NEWLINE, generate_tokens
from xml.etree import ElementTree as etree

def parse(file, TreeBuilder=etree.TreeBuilder):
    tb = TreeBuilder()
    tb.start('ul', {})
    for type_, text, start, end, line in generate_tokens(file.readline):
        if type_ == NAME: # convert name to <li> item
            tb.start('li', {})
            tb.data(text)
            tb.end('li')
        elif type_ == NEWLINE:
            continue
        elif type_ == INDENT: # start <ul>
            tb.start('ul', {})
        elif type_ == DEDENT: # end </ul>
            tb.end('ul')
        elif type_ == ENDMARKER: # done
            tb.end('ul') # end parent list
            break
        else: # unexpected token
            assert 0, (type_, text, start, end, line)
    return tb.close() # return root element

Any class that provides .start(), .end(), .data(), .close() methods can be used as a TreeBuilder e.g., you could just write html on the fly instead of building a tree.

To parse stdin and write html to stdout you could use ElementTree.write():

import sys

etree.ElementTree(parse(sys.stdin)).write(sys.stdout, method='html')

Output:

<ul><li>A</li><ul><li>B</li><li>C</li><ul><li>D</li><li>E</li></ul></ul></ul>

You can use any file, not just sys.stdin/sys.stdout.

Note: To write to stdout on Python 3 use sys.stdout.buffer or encoding="unicode" due to bytes/Unicode distinction.

like image 32
jfs Avatar answered Sep 28 '22 00:09

jfs