I'm working on a text pattern problem. I've the following input -
term = 'CG-14/0,2-L-0_2'
I need to remove all the possible punctuation (delimiters) from the input term. Basically I need the following output from the input term -
'CG1402L02'
I also need to store (in any format (object, dict, tuple etc.)) the delimiter and the position of the delimiter before removing the delimiters.
Example of the output (If returned as tuple) -
((-,2), (/,5), (,,7), (-,9), (-,11), (_,13))
I'm able to get the output using the following python code -
re.sub(r'[^\w]', '', term.replace('_', ''))
But how do I store the delimiter and delimiter position (in the most efficient way) before removing the delimiters?
You can simply walk once through term and collect all nessessary infos on the way:
from string import ascii_letters,digits
term = 'CG-14/0,2-L-0_2'
# defined set of allowed characters a-zA-Z0-9
# set lookup is O(1) - fast
ok = set(digits +ascii_letters)
specials = {}
clean = []
for i,c in enumerate(term):
if c in ok:
clean.append(c)
else:
specials.setdefault(c,[])
specials[c].append(i)
cleaned = ''.join(clean)
print(clean)
print(cleaned)
print(specials)
Output:
['C', 'G', '1', '4', '0', '2', 'L', '0', '2'] # list of characters in set ok
CG1402L02 # the ''.join()ed list
{'-': [2, 9, 11], '/': [5], ',': [7], '_': [13]} # dict of characters/positions not in ok
See:
You can use
specials = []
and inside the iteration:
else:
specials.append((c,i))
to get a list of tuples instead of the dictionary:
[('-', 2), ('/', 5), (',', 7), ('-', 9), ('-', 11), ('_', 13)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With