I'm working on an application which will calculate molecular weight and I need to separate a string into the different molecules. I've been using a regex to do this but I haven't quite gotten it to work. I need the regex to match on patterns like H2OCl4 and Na2H2O where it would break it up into matches like:
The regex i've been working on is this:
([A-Z]\d*|[A-Z]*[a-z]\d*)
It's really close but it currently breaks the matches into this:
I need the Cl4 to be considered one match. Can anyone help me with the last part i'm missing in this. I'm pretty new to regular expressions. Thanks.
Using character sets For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.
\w (word character) matches any single letter, number or underscore (same as [a-zA-Z0-9_] ). The uppercase counterpart \W (non-word-character) matches any single character that doesn't match by \w (same as [^a-zA-Z0-9_] ). In regex, the uppercase metacharacter is always the inverse of the lowercase counterpart.
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
I think what you want is "[A-Z][a-z]?\d*"
That is, a capital letter, followed by an optional small letter, followed by an optional string of digits.
If you want to match 0, 1, or 2 lower-case letters, then you can write:
"[A-Z][a-z]{0,2}\d*"
Note, however, that both of these regular expressions assume that the input data is valid. Given bad data, it will skip over bad data. For example, if the input string is "H2ClxxzSO4", you're going to get:
If you want to detect bad data, you'll need to check the Index
property of the returned Match
object to ensure that it is equal to the beginning index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With