I have a list of dicts that stores URLs. It has simply two fields, title
and url
. Example:
[
{'title': 'Index Page', 'url': 'http://www.example.com/something/index.htm'},
{'title': 'Other Page', 'url': 'http://www.example.com/something/other.htm'},
{'title': 'About Page', 'url': 'http://www.example.com/thatthing/about.htm'},
{'title': 'Detail Page', 'url': 'http://www.example.com/something/thisthing/detail.htm'},
]
However, I'd to get a tree structure from this list of dicts. I'm looking for something like this:
{ 'www.example.com':
[
{ 'something':
[
{ 'thisthing':
[
{ 'title': 'Detail Page', 'url': 'detail.htm'}
]
},
[
{ 'title': 'Index Page', 'url': 'index.htm'},
{ 'title': 'Other Page', 'url': 'other.htm'}
]
]
},
{ 'thatthing':
[
{ 'title': 'About Page', 'url': 'about.htm'}
]
}
]
}
My first attempt at this would be a urlparse soup in a bunch of for loops and I'm confident that there's a better and faster way to do this.
I've seen people on SO work magic with list comprehensions, lambda functions, etc. I'm still in the process of figuring it out.
(For Django developers: I'll be using this my Django application. I'm storing the URLs in a model called Page
which has two fields name
and title
)
Third time is the charm... that is some nice structure you have there :). In your comment you mention that you "haven't been able to think of a better tree format to represent data like this"... this made me again take the liberty to (just slightly) alter the formatting of the output. In order to dynamically add subelements, a dictionary has to be created to house them. But for "leaf nodes", this dictionary is never filled in. If desired these can of course be removed by another loop, but it cannot happen during the iteration because the empty dict
should be present for possible new nodes. The some goes for nodes that do not have a file in the: these will contain an empty list
.
ll = [
{'title': 'Index Page', 'url': 'http://www.example.com/something/index.htm'},
{'title': 'Other Page', 'url': 'http://www.example.com/something/other.htm'},
{'title': 'About Page', 'url': 'http://www.example.com/thatthing/about.htm'},
{'title': 'Detail Page', 'url': 'http://www.example.com/something/thisthing/detail.htm'},
]
# First build a list of all url segments: final item is the title/url dict
paths = []
for item in ll:
split = item['url'].split('/')
paths.append(split[2:-1])
paths[-1].append({'title': item['title'], 'url': split[-1]})
# Loop over these paths, building the format as we go along
root = {}
for path in paths:
branch = root.setdefault(path[0], [{}, []])
for step in path[1:-1]:
branch = branch[0].setdefault(step, [{}, []])
branch[1].append(path[-1])
# As for the cleanup: because of the alternating lists and
# dicts it is a bit more complex, but the following works:
def walker(coll):
if isinstance(coll, list):
for item in coll:
yield item
if isinstance(coll, dict):
for item in coll.itervalues():
yield item
def deleter(coll):
for data in walker(coll):
if data == [] or data == {}:
coll.remove(data)
deleter(data)
deleter(root)
import pprint
pprint.pprint(root)
Output:
{'www.example.com':
[
{'something':
[
{'thisthing':
[
[
{'title': 'Detail Page', 'url': 'detail.htm'}
]
]
},
[
{'title': 'Index Page', 'url': 'index.htm'},
{'title': 'Other Page', 'url': 'other.htm'}
]
],
'thatthing':
[
[
{'title': 'About Page', 'url': 'about.htm'}
]
]
},
]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With