So, for input: <pre class="prettyprint"><code>accessibility,random good bye </code></pre> I want output: <pre class="prettyprint"><code>a11y,r4m g2d bye </code></pre> So, basically, I have to abbreviate all words of length greater than or equal to 4 in the following format: <code>first_letter + length_of_all_letters_in_between + last_letter</code> I try to do this: <pre class="prettyprint"><code>re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", r"\1" + str(len(r"\2")) + r"\3", s) </code></pre> But it does not work. In <code>JS</code>, I would easily do: <pre class="prettyprint"><code>str.replace(/([A-Za-z])([A-Za-z]{2,})([A-Za-z])/g, function(m, $1, $2, $3){ return $1 + $2.length + $3; }); </code></pre> How do I do the same in Python? EDIT: I cannot afford to lose any punctuation present in original string.

What you are doing in JavaScript is certainly right, you are passing an anonymous function. What you do in Python is to pass a constant expression ("\12\3", since <code>len(r"\2")</code> is evaluated before the function call), it is not a function that can be evaluated for each match! While anonymous functions in Python aren't quite as useful as they are in JS, they do the job here: <pre class="prettyprint"><code>>>> import re >>> re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", lambda m: "{}{}{}".format(m.group(1), len(m.group(2)), m.group(3)), "accessability, random good bye") 'a11y, r4m g2d bye' </code></pre> What happens here is that the lambda is called for each substitution, taking a match object. I then retrieve the needed information and build a substitution string from that.

The issue you're running into is that <code>len(r'\2')</code> is always <code>2</code>, not the length of the second capturing group in your regular expression. You can use a <code>lambda</code> expression to create a function that works just like the code you would use in JavaScript: <pre class="prettyprint"><code>re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", lambda m: m.group(1) + str(len(m.group(2)) + m.group(3), s) </code></pre> The <code>m</code> argument to the lambda is a <code>match</code> object, and the calls to its <code>group</code> method are equivalent to the backreferences you were using before. It might be easier to just use a simple word matching pattern with no capturing groups (<code>group()</code> can still be called with no argument to get the whole matched text): <pre class="prettyprint"><code>re.sub(r'\w{4,}', lambda m: m.group()[0] + str(len(m.group())-2) + m.group()[-1], s) </code></pre>

Replace in string based on function ouput

Tags:

python

regex

So, for input:

accessibility,random good bye

I want output:

a11y,r4m g2d bye

So, basically, I have to abbreviate all words of length greater than or equal to 4 in the following format: first_letter + length_of_all_letters_in_between + last_letter

I try to do this:

re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", r"\1" + str(len(r"\2")) + r"\3", s)

But it does not work. In JS, I would easily do:

str.replace(/([A-Za-z])([A-Za-z]{2,})([A-Za-z])/g, function(m, $1, $2, $3){
   return $1 + $2.length + $3;
});

How do I do the same in Python?

EDIT: I cannot afford to lose any punctuation present in original string.

301

asked May 30 '15 10:05

Gaurang Tandon

2 Answers

What you are doing in JavaScript is certainly right, you are passing an anonymous function. What you do in Python is to pass a constant expression ("\12\3", since len(r"\2") is evaluated before the function call), it is not a function that can be evaluated for each match!

While anonymous functions in Python aren't quite as useful as they are in JS, they do the job here:

>>> import re
>>> re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", lambda m: "{}{}{}".format(m.group(1), len(m.group(2)), m.group(3)), "accessability, random good bye")
'a11y, r4m g2d bye'

What happens here is that the lambda is called for each substitution, taking a match object. I then retrieve the needed information and build a substitution string from that.

answered Sep 20 '22 13:09

Cu3PO42

The issue you're running into is that len(r'\2') is always 2, not the length of the second capturing group in your regular expression. You can use a lambda expression to create a function that works just like the code you would use in JavaScript:

re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])",
       lambda m: m.group(1) + str(len(m.group(2)) + m.group(3),
       s)

The m argument to the lambda is a match object, and the calls to its group method are equivalent to the backreferences you were using before.

It might be easier to just use a simple word matching pattern with no capturing groups (group() can still be called with no argument to get the whole matched text):

re.sub(r'\w{4,}', lambda m: m.group()[0] + str(len(m.group())-2) + m.group()[-1], s)

answered Sep 18 '22 13:09

Blckknght

Related questions
                            
                                Scrapy: Pass arguments to cmdline.execute()
                            
                                ImportError: No module named 'Crypto'
                            
                                Business Opening hours in Django
                            
                                Django ORM calculate number of days between two date attributes
                            
                                How to get COUNT query in django
                            
                                searching a namedtuple like a dictionary
                            
                                Checking if a list contains a certain sequence of numbers
                            
                                Python semicolon does make a difference
                            
                                Provide a default for ForeignKey field on existing entries in Django
                            
                                How to catch - 'NoneType' object has no attribute 'something'
                            
                                Can the name and the reference of a named tuple be different?
                            
                                Functions from Python packages for udf() of Spark dataframe
                            
                                Using Angular JS(Protractor) with Selenium in Python
                            
                                Minimizing a multivariable function with scipy. Derivative not known
                            
                                Python - Raw String Literals
                            
                                Python OpenCV drawing errors after manipulating array with numpy
                            
                                Scatter a 2D numpy array in matplotlib
                            
                                Get previous object without len(list)
                            
                                Fuzzy text search in Python
                            
                                Django 1.8 inspectdb command doesn't see PostgreSQL views as per documentation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With