I am currently working on a program that can parse a chemical formula and return molecular weight and percent composition. The following code works very well with compounds such as H2O, LiOH, CaCO3, and even C12H22O11. However, it is not capable of understanding compounds with polyatomic ions that lie within parenthesis, such as (NH4)2SO4.
I am not looking for someone to necessarily write the program for me, but just give me a few tips on how I might accomplish such a task.
Currently, the program iterates through the inputted string, raw_molecule
, first finding each element's atomic number, to store in a vector (I use a map<string, int>
to store names and atomic #). It then finds the quantities of each element.
bool Compound::parseString() {
map<string,int>::const_iterator search;
string s_temp;
int i_temp;
for (int i=0; i<=raw_molecule.length(); i++) {
if ((isupper(raw_molecule[i]))&&(i==0))
s_temp=raw_molecule[i];
else if(isupper(raw_molecule[i])&&(i!=0)) {
// New element- so, convert s_temp to atomic # then store in v_Elements
search=ATOMIC_NUMBER.find (s_temp);
if (search==ATOMIC_NUMBER.end())
return false;// There is a problem
else
v_Elements.push_back(search->second); // Add atomic number into vector
s_temp=raw_molecule[i]; // Replace temp with the new element
}
else if(islower(raw_molecule[i]))
s_temp+=raw_molecule[i]; // E.g. N+=a which means temp=="Na"
else
continue; // It is a number/parentheses or something
}
// Whatever's in temp must be converted to atomic number and stored in vector
search=ATOMIC_NUMBER.find (s_temp);
if (search==ATOMIC_NUMBER.end())
return false;// There is a problem
else
v_Elements.push_back(search->second); // Add atomic number into vector
// --- Find quantities next --- //
for (int i=0; i<=raw_molecule.length(); i++) {
if (isdigit(raw_molecule[i])) {
if (toInt(raw_molecule[i])==0)
return false;
else if (isdigit(raw_molecule[i+1])) {
if (isdigit(raw_molecule[i+2])) {
i_temp=(toInt(raw_molecule[i])*100)+(toInt(raw_molecule[i+1])*10)+toInt(raw_molecule[i+2]);
v_Quantities.push_back(i_temp);
}
else {
i_temp=(toInt(raw_molecule[i])*10)+toInt(raw_molecule[i+1]);
v_Quantities.push_back(i_temp);
}
}
else if(!isdigit(raw_molecule[i-1])) { // Look back to make sure the digit is not part of a larger number
v_Quantities.push_back(toInt(raw_molecule[i])); // This will not work for polyatomic ions
}
}
else if(i<(raw_molecule.length()-1)) {
if (isupper(raw_molecule[i+1])) {
v_Quantities.push_back(1);
}
}
// If there is no number, there is only 1 atom. Between O and N for example: O is upper, N is upper, O has 1.
else if(i==(raw_molecule.length()-1)) {
if (isalpha(raw_molecule[i]))
v_Quantities.push_back(1);
}
}
return true;
}
This is my first post, so if I have included too little (or maybe too much) information, please forgive me.
While you might be able to do an ad-hoc scanner-like thing that can handle one level of parens, the canonical technique used for things like this is to write a real parser.
And there are two common ways to do that...
(And technically, there is a third category, PEG, that is machine-generated-top-down.)
Anyway, for case 1, you need to code a recursive call to your parser when you see a (
and then return from this level of recursion on the )
token.
Typically a tree-like internal representation is created; this is called a syntax tree, but in your case, you can probably skip that and just return the atomic weight from the recursive call, adding to the level you will be returning from the first instance.
For case 2, you need to use a tool like yacc to turn a grammar into a parser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With