Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match on capital letter, digit or capital, lowercase, and digit

Tags:

string

c#

regex

I'm working on an application which will calculate molecular weight and I need to separate a string into the different molecules. I've been using a regex to do this but I haven't quite gotten it to work. I need the regex to match on patterns like H2OCl4 and Na2H2O where it would break it up into matches like:

  1. H2
  2. O
  3. Cl4

  1. Na2
  2. H2
  3. O

The regex i've been working on is this:

([A-Z]\d*|[A-Z]*[a-z]\d*)

It's really close but it currently breaks the matches into this:

  1. H2
  2. O
  3. C
  4. l4

I need the Cl4 to be considered one match. Can anyone help me with the last part i'm missing in this. I'm pretty new to regular expressions. Thanks.

like image 318
CoderX_599 Avatar asked Feb 04 '11 20:02

CoderX_599


People also ask

How do you match a capital letter in regex?

Using character sets For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.

How do you match a number or letter in regex?

\w (word character) matches any single letter, number or underscore (same as [a-zA-Z0-9_] ). The uppercase counterpart \W (non-word-character) matches any single character that doesn't match by \w (same as [^a-zA-Z0-9_] ). In regex, the uppercase metacharacter is always the inverse of the lowercase counterpart.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.


1 Answers

I think what you want is "[A-Z][a-z]?\d*"

That is, a capital letter, followed by an optional small letter, followed by an optional string of digits.

If you want to match 0, 1, or 2 lower-case letters, then you can write:

"[A-Z][a-z]{0,2}\d*"

Note, however, that both of these regular expressions assume that the input data is valid. Given bad data, it will skip over bad data. For example, if the input string is "H2ClxxzSO4", you're going to get:

  1. H2
  2. Clx
  3. S
  4. O4

If you want to detect bad data, you'll need to check the Index property of the returned Match object to ensure that it is equal to the beginning index.

like image 172
Jim Mischel Avatar answered Nov 14 '22 00:11

Jim Mischel