I have two lists:
The first one is a regular list which contains links of Sitemaps:
ur = ['https://www.hi.de/hu/sitemap.xml',
'https://www.hi.de/ma/sitemap.xml',
'https://www.hi.de/au/sitemap.xml',
]
The second list is nested and contains links which were indexed on the sitemaps and a date for every link:
wh = [['No-Date', 'https://www.hi.de/hu/artikel/xxx', ''],
['2019-11-13', 'https://www.hi.de/ma/artikel/xxx'],
['2019-11-12', 'https://www.hi.de/ma/artikel/xxx'],
['2019-11-11', 'https://www.hi.de/au/artikel/xxx']]
Now I want to merge the list with the nedted list based on the sitmap they came from like this:
ui = [['https://www.hi.de/hu/sitemap.xml', 'No-Date', 'https://www.hi.de/hu/artikel/xxx', ''],
['https://www.hi.de/ma/sitemap.xml' '2019-11-13', 'https://www.hi.de/ma/artikel/xxx'],
['https://www.hi.de/ma/sitemap.xml', '2019-11-12', 'https://www.hi.de/ma/artikel/xxx'],
['https://www.hi.de/au/sitemap.xml', '2019-11-11', 'https://www.hi.de/au/artikel/xxx']]
But with my code:
ui = [[(url2, x) for url2 in ur for x in y if url2.rsplit('/', 1)[0] in x] for y in wh]
The date in every sublist gets deleted and additionally the entries are stored in a tuple like this:
...
[[('https://www.hi.de/hu/sitemap.xml', 'https://www.hi.de/hu/artikel/xxx', '')],
...
How can I change the code to get the desired result in the variable ui?
You can concatenate multiple lists into one list by using the * operator. For Example, [*list1, *list2] – concatenates the items in list1 and list2 and creates a new resultant list object. Usecase: You can use this method when you want to concatenate multiple lists into a single list in one shot.
Lists can be nested within other lists, as shown in the following example that details a sequenced plan to relocate. In this case, it's an ordered list inside another one, though you can nest any type of list within any other type (see the dl entry in this chapter for a related note).
Python's extend() method can be used to concatenate two lists in Python. The extend() function does iterate over the passed parameter and adds the item to the list thus, extending the list in a linear fashion. All the elements of the list2 get appended to list1 and thus the list1 gets updated and results as output.
You can use a list comprehension that checks for the matching sitemap between two lists to get your desired result:
ur = ['https://www.hi.de/hu/sitemap.xml',
'https://www.hi.de/ma/sitemap.xml',
'https://www.hi.de/au/sitemap.xml',
]
wh = [['No-Date', 'https://www.hi.de/hu/artikel/xxx', ''],
['2019-11-13', 'https://www.hi.de/ma/artikel/xxx'],
['2019-11-12', 'https://www.hi.de/ma/artikel/xxx'],
['2019-11-11', 'https://www.hi.de/au/artikel/xxx']]
print([[[u] + x] for x in wh for u in ur if x[1].split('/')[3] == u.split('/')[3]])
which outputs:
[['https://www.hi.de/hu/sitemap.xml', 'No-Date', 'https://www.hi.de/hu/artikel/xxx', ''],
['https://www.hi.de/ma/sitemap.xml' '2019-11-13', 'https://www.hi.de/ma/artikel/xxx'],
['https://www.hi.de/ma/sitemap.xml', '2019-11-12', 'https://www.hi.de/ma/artikel/xxx'],
['https://www.hi.de/au/sitemap.xml', '2019-11-11', 'https://www.hi.de/au/artikel/xxx']]
You can transform ur
to a dictionary for easier lookup:
import re
ur = ['https://www.hi.de/hu/sitemap.xml', 'https://www.hi.de/ma/sitemap.xml', 'https://www.hi.de/au/sitemap.xml']
data = [['No-Date', 'https://www.hi.de/hu/artikel/xxx'], ['2019-11-13', 'https://www.hi.de/ma/artikel/xxx'], ['2019-11-12', 'https://www.hi.de/ma/artikel/xxx'], ['2019-11-11', 'https://www.hi.de/au/artikel/xxx']]
d = dict((re.split('/(?=sitemap\.)', i)[0], i) for i in ur)
result = [[d[re.split('/(?=\w{3,}/)', b)[0]], a, b] for a, b in data]
Output:
[['https://www.hi.de/hu/sitemap.xml', 'No-Date', 'https://www.hi.de/hu/artikel/xxx'],
['https://www.hi.de/ma/sitemap.xml', '2019-11-13', 'https://www.hi.de/ma/artikel/xxx'],
['https://www.hi.de/ma/sitemap.xml', '2019-11-12', 'https://www.hi.de/ma/artikel/xxx'],
['https://www.hi.de/au/sitemap.xml', '2019-11-11', 'https://www.hi.de/au/artikel/xxx']]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With