I want to find all the elements in a list that match a regex. To decrease the number of times regex matching is done, I created a string by joining the elements delimited with a space, as given below:
list_a = ["4123", "7648", "afjsdn", "ujaf", "huh23"]
regex_num = r"\d+"
string_a = " ".join(list_a)
num_matches = re.findall(regex_num, string_a)
The list and the matches are as given below:
list_a: ['4123', '7648', 'afjsdn', 'ujaf', 'huh23']
matches: ['4123', '7648', '23']
Now that I have all my matches I want to know whether the match was part of the element/token or an entire token. One way I can do this is by comparing the match with the actual token/element:
"23" == "huh23"
False
But to do this, I would require the token serial number. Which isn't available directly. The only position information regex matching can provide is the span of the match which is at a character level.
The other path I could take is to just apply regex matching for all the elements by looping through the list and comparing the string with the match if there is a match.
I would like to reduce as much time complexity as possible for this operation.
Is there a more pythonic way of determining whether a match is just a part of the token or is there a more pythonic way to find the serial number of the matched word so that the initial list could be exploited for string comparison?
Any help would be appreciated. Thanks in advance!
Edit 1:
If my list is something like:
list_a = ["4123", "7648", "afjsdn", "ujaf", "huh23", "n23kl3l24"] like suggested by @Artyom Vancyan in the comments
The output I would like is:
matches_with_slno = [[0,'4123'], [1,'7648'], [4, '23'], [5, '23'], [5,'3'], [5, '24']
yield fromThe most pythonic solution I would recommend is mixing enumerate with a generator.
import re
arr = ['4123', '7648', 'afjsdn', 'ujaf', 'huh23', 'n23kl3l24']
def process(array):
for index, item in enumerate(array):
yield from [[index, match] for match in re.findall(r"\d+", item)]
print(list(process(arr))) # [[0, '4123'], [1, '7648'], [4, '23'], [5, '23'], [5, '3'], [5, '24']]
One of the usages of yield from is list flattening. Also, yield from cannot be used in a list comprehension; otherwise, we would have one line code. And we use enumerate to have an element's serial index (number). As yield is used, the process function becomes a generator.
NOTE: In the generator implementation, we use a loop and list comprehension as well.
import re
arr = ['4123', '7648', 'afjsdn', 'ujaf', 'huh23', 'n23kl3l24']
print([[index, match] for index, item in enumerate(arr) for match in re.findall(r"\d+", item)]) # [[0, '4123'], [1, '7648'], [4, '23'], [5, '23'], [5, '3'], [5, '24']]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With