Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing all numeric value to formatted string

What I am trying to do is:

Find out all the numeric values in a string.

input_string = "高露潔光感白輕悅薄荷牙膏100   79.80"

numbers = re.finditer(r'[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?',input_string)

for number in numbers:
    print ("{}    start > {}, end > {}".format(number.group(), number.start(0), number.end(0)))

'''Output'''
>>100    start > 12, end > 15
>>79.80    start > 18, end > 23

And then I want to replace all the integer and float value to a certain format:

INT_(number of digit) and FLT(number of decimal places)

eg. 100 -> INT_3 // 79.80 -> FLT_2

Thus, the expect output string is like this:

"高露潔光感白輕悅薄荷牙膏INT_3   FLT2"

But the string replace substring method in Python is kind of weird, which can't archive what I want to do.

So I am trying to use the substring append substring methods

string[:number.start(0)] + "INT_%s"%len(number.group()) +.....

which looks stupid and most importantly I still can't make it work.

Can anyone give me some advice on this problem?

like image 902
Pang Ho Ming Avatar asked Mar 12 '23 08:03

Pang Ho Ming


2 Answers

Use re.sub and a callback method inside where you can perform various manipulations on the match:

import re
def repl(match):
    chunks = match.group(1).split(".")
    if len(chunks) == 2:
        return "FLT_{}".format(len(chunks[1]))
    else:
        return "INT_{}".format(len(chunks[0]))

input_string = "高露潔光感白輕悅薄荷牙膏100   79.80"
result = re.sub(r'[-+]?([0-9]*\.?[0-9]+)(?:[eE][-+]?[0-9]+)?',repl,input_string)
print(result)

See the Python demo

Details:

  • The regex now has a capturing group over the number part (([0-9]*\.?[0-9]+)), this will be analyzed inside the repl method
  • Inside the repl method, Group 1 contents is split with . to see if we have a float/double, and if yes, we return the length of the fractional part, else, the length of the integer number.
like image 122
Wiktor Stribiżew Avatar answered Mar 13 '23 20:03

Wiktor Stribiżew


You need to group the parts of your regex possibly like this

import re

def repl(m):
    if m.group(1) is None: #int
        return ("INT_%i"%len(m.group(2)))        
    else: #float
        return ("FLT_%i"%(len(m.group(2))))

input_string = "高露潔光感白輕悅薄荷牙膏100   79.80"

numbers = re.sub(r'[-+]?([0-9]*\.)?([0-9]+)([eE][-+]?[0-9]+)?',repl,input_string)        

print(numbers)
  • group 0 is the whole string that was matched (can be used for putting into float or int)
  • group 1 is any digits before the . and the . itself if exists else it is None
  • group 2 is all digits after the . if it exists else it it is just all digits
  • group 3 is the exponential part if existing else None

You can get a python-number from it with

def parse(m):
    s=m.group(0)
    if m.group(1) is not None or m.group(3) is not None: # if there is a dot or an exponential part it must be a float
        return float(s)
    else:
        return int(s)
like image 44
janbrohl Avatar answered Mar 13 '23 20:03

janbrohl