Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex match count of characters that are separated by non-matching characters

I want to count characters, but they may be separated by characters that do not match.

Here is an example. I want to match a text that has 10 or more word-characters. It may include spaces but i don't want to count the spaces.

Should not match: "foo bar baz" (should count 9)
Should not match: "a          a" (should count 2)
Should match: "foo baz bars" (should count 10, match whole string)

This is what i came up with, but it counts the whole thing:

((?<=\s)*\w(?=\s)*){10}

Edit I do not want to include spaces for counting. Sorry I edited this a few times, I didn't describe it correctly.

Any ideas on this?

like image 821
jomo Avatar asked Aug 09 '13 10:08

jomo


People also ask

What does \d mean in regex?

\d (digit) matches any single digit (same as [0-9] ). The uppercase counterpart \D (non-digit) matches any single character that is not a digit (same as [^0-9] ). \s (space) matches any single whitespace (same as [ \t\n\r\f] , blank, tab, newline, carriage-return and form-feed).

What does \+ mean in regex?

Example: The regex "aa\n" tries to match two consecutive "a"s at the end of a line, inclusive the newline character itself. Example: "a\+" matches "a+" and not a series of one or "a"s. ^ the caret is the anchor for the start of the string, or the negation symbol.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1. 1* means any number of ones.

What is the difference between * and *??

*? is non-greedy. * will match nothing, but then will try to match extra characters until it matches 1 , eventually matching 101 . All quantifiers have a non-greedy mode: . *? , .


2 Answers

Hey I think this would a simple but working one:

( *?[0-9a-zA-Z] *?){10,}

Breaking the regex down:

  1. ( *? --------It can start with space(s)
  2. [0-9a-zA-Z] -Followed with the alphanumeric values
  3. *?) ---------It can end with space(s)
  4. {10,} -------Matches this pattern 10 or more times

Key: When I look at the count for regexes, it applies to the group, i.e., the things in the brackets "()", this case, multiple spaces followed ONE from the alphanumeric values followed by spaces are still counted as one match. Hope it helps. :)

like image 101
Juto Avatar answered Oct 12 '22 14:10

Juto


Use a group that consumes spaces with each single word char, and count the groups:

^(\s*\w){10,}\s*$
like image 29
Bohemian Avatar answered Oct 12 '22 13:10

Bohemian