Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I accumulate a sequence of digits in a string and convert them to one number?

Tags:

python

string

I need to decode a string 'a3b2' into 'aaabb'. The problem is when the numbers are double,triple digits. E.g. 'a10b3' should detect that the number is not 1 but 10.

I need to start accumulating digits.

a = "a12345t5i6o2r43e2"
for i in range(0, len(a)-1):
  if a[i].isdigit() is False: 
   #once i see a letter, i launch a while loop to check how long a digit streak

   #after it can be - it's 2,3,4,5 digit number etc
    print(a[i])
    current_digit_streak = ''
    counter = i+1
    while a[counter].isdigit():  #this gives index out of range error!
      current_digit_streak += a[counter]
      counter+=1

If I change the while loop to this:

while a[counter].isdigit() and counter < ( len(a)-1)

it does work but omits the last letter. I should not use regex, only loops.

like image 973
ERJAN Avatar asked Nov 27 '18 16:11

ERJAN


2 Answers

Regex is a good fit here.

import re
pat = re.compile(r"""
(\w)       # a word character, followed by...
(\d+)      # one or more digits""", flags=re.X)

s = "a12345t5i6o2r43e2"
groups = pat.findall(s)
# [('a', '12345'), ('t', '5'), ('i', '6'), ('o', '2'), ('r', '43'), ('e', '2')]

result = ''.join([lett*int(count) for lett, count in groups])

Since you can't use regex for some unbeknownst reason, I recommend a recursive function to split the string into parts.

import itertools

def split_into_groups(s):
    if not s:
        return []
    lett, *rest = s
    count, rest = int(itertools.takewhile(str.isdigit, rest)), itertools.dropwhile(str.isdigit, rest)
    return [(lett, count)] + split_into_groups(rest)

s = "a12345t5i6o2r43e2"
groups = split_into_groups(s)

result = ''.join([lett*count for lett, count in groups])

or, using a more generic (and Functional-derived) pattern:

def unfold(f, x):
    while True:
        v, x = f(x)
        yield v

def get_group(s):
    if not s:
        raise StopIteration()
    lett, *rest = s
    count, rest = int(itertools.takewhile(str.isdigit, rest)), itertools.dropwhile(str.isdigit, rest)
    return lett*count, rest

s = "a12345t5i6o2r43e2"
result = ''.join(unfold(get_group, s))
like image 120
Adam Smith Avatar answered Oct 15 '22 22:10

Adam Smith


You could use groupby:

from itertools import groupby

text = 'a12345t5i6o2r43e2'

groups = [''.join(group) for _, group in groupby(text, key=str.isdigit)]
result = list(zip(groups[::2], groups[1::2]))

print(result)

Output

[('a', '12345'), ('t', '5'), ('i', '6'), ('o', '2'), ('r', '43'), ('e', '2')]
like image 30
Dani Mesejo Avatar answered Oct 15 '22 20:10

Dani Mesejo