So, for input:
accessibility,random good bye
I want output:
a11y,r4m g2d bye
So, basically, I have to abbreviate all words of length greater than or equal to 4 in the following format: first_letter + length_of_all_letters_in_between + last_letter
I try to do this:
re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", r"\1" + str(len(r"\2")) + r"\3", s)
But it does not work. In JS
, I would easily do:
str.replace(/([A-Za-z])([A-Za-z]{2,})([A-Za-z])/g, function(m, $1, $2, $3){
return $1 + $2.length + $3;
});
How do I do the same in Python?
EDIT: I cannot afford to lose any punctuation present in original string.
The replace() method searches a string for a value or a regular expression. The replace() method returns a new string with the value(s) replaced. The replace() method does not change the original string.
Python String replace() MethodThe replace() method replaces a specified phrase with another specified phrase. Note: All occurrences of the specified phrase will be replaced, if nothing else is specified.
The Java string replace() method will replace a character or substring with another character or string. The syntax for the replace() method is string_name. replace(old_string, new_string) with old_string being the substring you'd like to replace and new_string being the substring that will take its place.
The REPLACE Function[1] is categorized under Excel TEXT functions. The function will replace part of a text string, based on the number of characters you specify, with a different text string.
What you are doing in JavaScript is certainly right, you are passing an anonymous function. What you do in Python is to pass a constant expression ("\12\3", since len(r"\2")
is evaluated before the function call), it is not a function that can be evaluated for each match!
While anonymous functions in Python aren't quite as useful as they are in JS, they do the job here:
>>> import re
>>> re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", lambda m: "{}{}{}".format(m.group(1), len(m.group(2)), m.group(3)), "accessability, random good bye")
'a11y, r4m g2d bye'
What happens here is that the lambda is called for each substitution, taking a match object. I then retrieve the needed information and build a substitution string from that.
The issue you're running into is that len(r'\2')
is always 2
, not the length of the second capturing group in your regular expression. You can use a lambda
expression to create a function that works just like the code you would use in JavaScript:
re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])",
lambda m: m.group(1) + str(len(m.group(2)) + m.group(3),
s)
The m
argument to the lambda is a match
object, and the calls to its group
method are equivalent to the backreferences you were using before.
It might be easier to just use a simple word matching pattern with no capturing groups (group()
can still be called with no argument to get the whole matched text):
re.sub(r'\w{4,}', lambda m: m.group()[0] + str(len(m.group())-2) + m.group()[-1], s)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With