Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match only those numbers which have an even number of `%`s preceding them?

I want to catch numbers appearing anywhere in a string, and replace them with "(.+)".

But I want to catch only those numbers which have an even number of %s preceding them. No worries if any surrounding chars get caught up: we can use capture groups to filter out the numbers.

I'm unable to come up with an ECMAscript regular expression.

Here is the playground:

abcd %1 %%2 %%%3 %%%%4 efgh

abcd%12%%34%%%666%%%%11efgh

A successful catch will behave like this:
desired behaviour


Things I have tried:

attempt 1

attempt 2

attempt 3


If you have realised, the third attempt is almost working. The only problems are in the second line of playground. Actually, what I wanted to say in that expression is:

Match a number if it is preceded by an even number of %s AND either of the following is true:

  • The above whole expression is preceded by nothing [absence of (unconsumed or otherwise) character].
  • The above whole expression is preceded by a character other than %.

Is there a way to match the absence of a character?
That's what I was trying to do by using \0 in the third attempt.

like image 916
AneesAhmed777 Avatar asked Jul 10 '16 11:07

AneesAhmed777


People also ask

How to match if there is any even number of 'a'?

That would work for match even (and only even) number of'A'. Now if you want to match 'if there is any even number of subsequent letters', this would do the trick: re.compile(r'(.)\1') However, this wouldn't exclude the 'odd' occurences. But it is not clear from your question if you really want that. Update: This works for you test cases:

What are even and odd numbers?

Even numbers always end up with the last digit as 0, 2, 4, 6 or 8. Some examples of even numbers are 2, 4, 6, 8, 10, 12, 14, 16. These are even numbers as these numbers can easily be divided by 2. It should be noted that the smallest positive even natural number is 2. If you pick a number that cannot be divided by 2 is known as an odd number.

How do you know if a group is even or odd?

If no objects are left over, we know there is an even number of objects in the group. If there is one object left over, then it is not an even pairing, and there is an odd number of objects in the group. For example, consider the number 8 and the number 7.

How to match even number of occurence with AA?

– Ross Rogers Jan 12, 2010 at 14:13 Add a comment | 1 '*' means 0 or more occurences 'AA' should do the trick. The question is if you want the thing to match 'AAA'. In that case you would have to do something like: r = re.compile('(^|[^A])(AA)+(?!A)',) r.search(p) That would work for match even (and only even) number of'A'.


2 Answers

You can use (?:[^%\d]|^|\b(?=%))(?:%%)*(\d+) as a pattern, where your number is stored into the first capturing group. This also treats numbers preceded by zero %-characters.

This will match the even number of %-signs, if they are preceded by:

  • neither % nor number (so we don't need to catch the last number before a %, as this wouldn't work with chains like %%1%%2)
  • the start of the string
  • a word boundary (thus any word character), for the chains mentioned above

You can see it in action here

like image 55
Sebastian Proske Avatar answered Oct 06 '22 18:10

Sebastian Proske


Issue

You want a regex with a negative infinite-width lookbehind:

(?<=(^|[^%])(?:%%)*)\d+

Here is the .NET regex demo

In ES7, it is not supported, you need to use language-specific means and a simplified regex to match any number of % before a digit sequence: /(%*)(\d+)/g and then check inside the replace callback if the number of percentage signs is even or not and proceed accordingly.

JavaScript

Instead of trying to emulate a variable-width lookbehind, you may just use JS means:

var re = /(%*)(\d+)/g;          // Capture into Group 1 zero or more percentage signs
var str = 'abcd %1 %%2 %%%3 %%%%4 efgh<br/><br/>abcd%12%%34%%%666%%%%11efgh';
var res = str.replace(re, function(m, g1, g2) { // Use a callback inside replace
  return (g1.length % 2 === 0) ? g1 + '(.+)' : m; // If the length of the %s is even
});                             // Return Group 1 + (.+), else return the whole match
document.body.innerHTML = res;

If there must be at least 2 % before digits, use /(%+)(\d+)/g regex pattern where %+ matches at least 1 (or more) percentage signs.

Conversion to C++

The same algorithm can be used in C++. The only problem is that there is no built-in support for a callback method inside the std::regex_replace. It can be added manually, and used like this:

#include <iostream>
#include <cstdlib>
#include <string>
#include <regex>
using namespace std;

template<class BidirIt, class Traits, class CharT, class UnaryFunction>
std::basic_string<CharT> regex_replace(BidirIt first, BidirIt last,
    const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
    std::basic_string<CharT> s;

    typename std::match_results<BidirIt>::difference_type
        positionOfLastMatch = 0;
    auto endOfLastMatch = first;

    auto callback = [&](const std::match_results<BidirIt>& match)
    {
        auto positionOfThisMatch = match.position(0);
        auto diff = positionOfThisMatch - positionOfLastMatch;

        auto startOfThisMatch = endOfLastMatch;
        std::advance(startOfThisMatch, diff);

        s.append(endOfLastMatch, startOfThisMatch);
        s.append(f(match));

        auto lengthOfMatch = match.length(0);

        positionOfLastMatch = positionOfThisMatch + lengthOfMatch;

        endOfLastMatch = startOfThisMatch;
        std::advance(endOfLastMatch, lengthOfMatch);
    };

    std::sregex_iterator begin(first, last, re), end;
    std::for_each(begin, end, callback);

    s.append(endOfLastMatch, last);

    return s;
}

template<class Traits, class CharT, class UnaryFunction>
std::string regex_replace(const std::string& s,
    const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
    return regex_replace(s.cbegin(), s.cend(), re, f);
}

std::string my_callback(const std::smatch& m) {
  if (m.str(1).length() % 2 == 0) {
    return m.str(1) + "(.+)";
  } else {
    return m.str(0);
  }
}

int main() {
    std::string s = "abcd %1 %%2 %%%3 %%%%4 efgh\n\nabcd%12%%34%%%666%%%%11efgh";
    cout << regex_replace(s, regex("(%*)(\\d+)"), my_callback) << endl;

    return 0;
}

See the IDEONE demo.

Special thanks for the callback code goes to John Martin.

like image 2
Wiktor Stribiżew Avatar answered Oct 06 '22 18:10

Wiktor Stribiżew