Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace in string based on function ouput

Tags:

python

regex

So, for input:

accessibility,random good bye

I want output:

a11y,r4m g2d bye

So, basically, I have to abbreviate all words of length greater than or equal to 4 in the following format: first_letter + length_of_all_letters_in_between + last_letter

I try to do this:

re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", r"\1" + str(len(r"\2")) + r"\3", s)

But it does not work. In JS, I would easily do:

str.replace(/([A-Za-z])([A-Za-z]{2,})([A-Za-z])/g, function(m, $1, $2, $3){
   return $1 + $2.length + $3;
});

How do I do the same in Python?

EDIT: I cannot afford to lose any punctuation present in original string.

like image 301
Gaurang Tandon Avatar asked May 30 '15 10:05

Gaurang Tandon


People also ask

How do you use the Replace function in string?

The replace() method searches a string for a value or a regular expression. The replace() method returns a new string with the value(s) replaced. The replace() method does not change the original string.

How do you replace a string in a Python function?

Python String replace() MethodThe replace() method replaces a specified phrase with another specified phrase. Note: All occurrences of the specified phrase will be replaced, if nothing else is specified.

How do you replace a value in a string with another value?

The Java string replace() method will replace a character or substring with another character or string. The syntax for the replace() method is string_name. replace(old_string, new_string) with old_string being the substring you'd like to replace and new_string being the substring that will take its place.

Is replace () a function?

The REPLACE Function[1] is categorized under Excel TEXT functions. The function will replace part of a text string, based on the number of characters you specify, with a different text string.


2 Answers

What you are doing in JavaScript is certainly right, you are passing an anonymous function. What you do in Python is to pass a constant expression ("\12\3", since len(r"\2") is evaluated before the function call), it is not a function that can be evaluated for each match!

While anonymous functions in Python aren't quite as useful as they are in JS, they do the job here:

>>> import re
>>> re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", lambda m: "{}{}{}".format(m.group(1), len(m.group(2)), m.group(3)), "accessability, random good bye")
'a11y, r4m g2d bye'

What happens here is that the lambda is called for each substitution, taking a match object. I then retrieve the needed information and build a substitution string from that.

like image 97
Cu3PO42 Avatar answered Sep 20 '22 13:09

Cu3PO42


The issue you're running into is that len(r'\2') is always 2, not the length of the second capturing group in your regular expression. You can use a lambda expression to create a function that works just like the code you would use in JavaScript:

re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])",
       lambda m: m.group(1) + str(len(m.group(2)) + m.group(3),
       s)

The m argument to the lambda is a match object, and the calls to its group method are equivalent to the backreferences you were using before.

It might be easier to just use a simple word matching pattern with no capturing groups (group() can still be called with no argument to get the whole matched text):

re.sub(r'\w{4,}', lambda m: m.group()[0] + str(len(m.group())-2) + m.group()[-1], s)
like image 20
Blckknght Avatar answered Sep 18 '22 13:09

Blckknght