supose this list:
list1=["House of Mine (1293) Item 21",
"House of Mine (1292) Item 24",
"The yard (1000) Item 1 ",
"The yard (1000) Item 2 ",
"The yard (1000) Item 4 "]
I want to add each item of it to a group (a list inside a list on this case) IF the substring till the (XXXX) is the same.
So, in this case, I am expecting to have:
[["House of Mine (1293) Item 21",
"House of Mine (1292) Item 24"],
["The yard (1000) Item 1 ",
"The yard (1000) Item 2 ",
"The yard (1000) Item 4 "]
The following code is what I was able to make, but it's not working:
def group(list1):
group=[]
for i, itemg in enumerate(list1):
try:
group[i]
except Exception:
group.append([])
for itemj in group[i]:
if re.findall(re.split("\(\d{4}\)\(", itemg)[0], itemj):
group[i].append(itemg)
else:
group.append([])
group[-1].append(itemg)
return group
I've read thanks to another topic in stack, the page of regular expressions http://www.diveintopython3.net/regular-expressions.html
I know the answer lies on it, but I'm having difficult understanding some concepts of it.
Set up the list to group:
>>> list1=["House of Mine (1293) Item 21","House of Mine (1292) Item 24", "The yard (1000) Item 1 ", "The yard (1000) Item 2 ", "The yard (1000) Item 4 "]
Define a function, used to sort and group items (this time using the number in parenthesis):
>>> keyf = lambda text: text.split("(")[1].split(")")[0]
>>> keyf
<function __main__.<lambda>>
>>> keyf(list1[0])
'1293'
Sort the list (in place here):
>>> list1.sort() #As Adam Smith noted, alphabetical sort is good enough
Take groupby from itertools
>>> from itertools import groupby
Check the concept:
>>> for gr, items in groupby(list1, key = keyf):
... print "gr", gr
... print "items", list(items)
...
>>> list1
['The yard (1000) Item 1 ',
'The yard (1000) Item 2 ',
'The yard (1000) Item 4 ',
'House of Mine (1292) Item 24',
'House of Mine (1293) Item 21']
Note, we had to call list on items, as items is an iterator over items.
Now using list comprehension:
>>> res = [list(items) for gr, items in groupby(list1, key=keyf)]
>>> res
[['The yard (1000) Item 1 ',
'The yard (1000) Item 2 ',
'The yard (1000) Item 4 '],
['House of Mine (1292) Item 24'],
['House of Mine (1293) Item 21']]
and we are done.
If you want to group by all the text before first "(", the only change is to:
>>> keyf = lambda text: text.split("(")[0]
>>> list1=["House of Mine (1293) Item 21","House of Mine (1292) Item 24", "The yard (1000) Item 1 ", "The yard (1000) Item 2 ", "The yard (1000) Item 4 "]
>>> keyf = lambda text: text.split("(")[0]
>>> [list(items) for gr, items in groupby(sorted(list1), key=keyf)]
[['House of Mine (1293) Item 21', 'House of Mine (1292) Item 24'],
['The yard (1000) Item 1 ',
'The yard (1000) Item 2 ',
'The yard (1000) Item 4 ']]
re.findallSolution assumes that "(" is the delimiter and ignores the requirement of having four digits there. Such a task can be resolved using re.
>>> import re
>>> keyf = lambda text: re.findall(".+(?=\(\d{4}\))", text)[0]
>>> text = 'House of Mine (1293) Item 21'
>>> keyf(text)
'House of Mine '
But it raises IndexError: list index out of range if the text does not have expected content (we are trying to acces item with index 0 from empty list).
>>> text = "nothing here"
IndexError: list index out of range
We can use simple trick, to survive, we append original text to ensure, something is there:
>>> keyf = lambda text: (re.findall(".+(?=\(\d{4}\))", text) + [text])[0]
>>> text = "nothing here"
>>> keyf(text)
'nothing here'
Final solution using re
>>> import re
>>> from itertools import groupby
>>> keyf = lambda text: (re.findall(".+(?=\(\d{4}\))", text) + [text])[0]
>>> [list(items) for gr, items in groupby(sorted(list1), key=keyf)]
[['House of Mine (1292) Item 24', 'House of Mine (1293) Item 21'],
['The yard (1000) Item 1 ',
'The yard (1000) Item 2 ',
'The yard (1000) Item 4 ']]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With