I have a list of url's and headers from a newspaper site in my country. As a general example:
x = ['URL1','news1','news2','news3','URL2','news1','news2','URL3','news1']
Each URL element has a corresponding sequence of 'news' elements, which can differ in length. In the example above, URL1 has 3 corresponding news and URL3 has only one.
Sometimes a URL has no corresponding "news" element:
y = ['URL4','news1','news2','URL5','URL6','news1']
I can easily find every URL index and the "news" elements of each URL.
My question is: Is it possible to transform this list into a dictionary in which the URL element is the key and the "news" elements are a list/tuple-value?
Expected Output
z = {'URL1':('news1', 'news2', 'news3'),
'URL2':('news1', 'news2'),
'URL3':('news1'),
'URL4':('news1', 'news2'),
'URL5':(),
'URL6':('news1')}
I've seen a similar question in this post, but it doesn't solve my problem.
Second, a dictionary key must be of a type that is immutable. For example, you can use an integer, float, string, or Boolean as a dictionary key. However, neither a list nor another dictionary can serve as a dictionary key, because lists and dictionaries are mutable.
Note that the restriction with keys in Python dictionary is only immutable data types can be used as keys, which means we cannot use a dictionary of list as a key .
Both can be nested. A list can contain another list. A dictionary can contain another dictionary. A dictionary can also contain a list, and vice versa.
You can do it like this:
>>> y = ['URL4','news1','news2','URL5','URL6','news1']
>>> result = {}
>>> current_url = None
>>> for entry in y:
... if entry.startswith('URL'):
... current_url = entry
... result[current_url] = ()
... else:
... result[current_url] += (entry, )
...
>>> result
{'URL4': ('news1', 'news2'), 'URL5': (), 'URL6': ('news1',)}
You can use itertools.groupby
with a key
function to identify a URL:
from itertools import groupby
def _key(url):
return url.startswith("URL") #in the body of _key, write code to identify a URL
data = ['URL1','news1','news2','news3','URL2','news1','news2','URL3','news1', 'URL4','news1','news2','URL5','URL6','news1']
new_d = [list(b) for _, b in groupby(data, key=_key)]
grouped = [[new_d[i], tuple(new_d[i+1])] for i in range(0, len(new_d), 2)]
result = dict([i for [*c, a], b in grouped for i in [(i, ()) for i in c]+[(a, b)]])
Output:
{
'URL1': ('news1', 'news2', 'news3'),
'URL2': ('news1', 'news2'),
'URL3': ('news1',),
'URL4': ('news1', 'news2'),
'URL5': (),
'URL6': ('news1',)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With