Replacing all numeric value to formatted string

What I am trying to do is:

Find out all the numeric values in a string.

input_string = "高露潔光感白輕悅薄荷牙膏100   79.80"

numbers = re.finditer(r'[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?',input_string)

for number in numbers:
    print ("{}    start > {}, end > {}".format(number.group(), number.start(0), number.end(0)))

>>100    start > 12, end > 15
>>79.80    start > 18, end > 23

And then I want to replace all the integer and float value to a certain format:

INT_(number of digit) and FLT(number of decimal places)

eg. 100 -> INT_3 // 79.80 -> FLT_2

Thus, the expect output string is like this:

"高露潔光感白輕悅薄荷牙膏INT_3   FLT2"

But the string replace substring method in Python is kind of weird, which can't archive what I want to do.

So I am trying to use the substring append substring methods

string[:number.start(0)] + "INT_%s"%len(number.group()) +.....

which looks stupid and most importantly I still can't make it work.

Can anyone give me some advice on this problem?

2 Answers

Use re.sub and a callback method inside where you can perform various manipulations on the match:

import re
def repl(match):
    chunks = match.group(1).split(".")
    if len(chunks) == 2:
        return "FLT_{}".format(len(chunks[1]))
        return "INT_{}".format(len(chunks[0]))

input_string = "高露潔光感白輕悅薄荷牙膏100   79.80"
result = re.sub(r'[-+]?([0-9]*\.?[0-9]+)(?:[eE][-+]?[0-9]+)?',repl,input_string)

See the Python demo


  • The regex now has a capturing group over the number part (([0-9]*\.?[0-9]+)), this will be analyzed inside the repl method
  • Inside the repl method, Group 1 contents is split with . to see if we have a float/double, and if yes, we return the length of the fractional part, else, the length of the integer number.
You need to group the parts of your regex possibly like this

import re

def repl(m):
    if m.group(1) is None: #int
        return ("INT_%i"%len(m.group(2)))        
    else: #float
        return ("FLT_%i"%(len(m.group(2))))

input_string = "高露潔光感白輕悅薄荷牙膏100   79.80"

numbers = re.sub(r'[-+]?([0-9]*\.)?([0-9]+)([eE][-+]?[0-9]+)?',repl,input_string)        

  • group 0 is the whole string that was matched (can be used for putting into float or int)
  • group 1 is any digits before the . and the . itself if exists else it is None
  • group 2 is all digits after the . if it exists else it it is just all digits
  • group 3 is the exponential part if existing else None

You can get a python-number from it with

def parse(m):
    if m.group(1) is not None or m.group(3) is not None: # if there is a dot or an exponential part it must be a float
        return float(s)
        return int(s)
