I'm trying to make a system to factorise (if that term is correct in Chemistry) a given expanded chemical formula, such as C6H2NO2NO2NO2CH3
into brackets so that it is C₆H₂(NO₂)₃CH₃
. (Ignore the subscript, or lack thereof in the first instance). Problem is, I don't know what the repeated molecule is going to be, or even how long it will be. How would I find and count repeats?
For context, here's my code so far which generates the formula from a 2D list of elements:
private String getFormula(List<List<Element>> elements)
{
String formula = ""; //TODO Switch out for StringBuilder
for(List<Element> currentElement : elements)
{
formula += currentElement.get(0).getSymbol(); //Every element per list is identical, so looking at 0 will always be safe
if(currentElement.size() > 1) formula += currentElement.size(); //Only display a number if there is more than 1 element
}
return formula;
}
EDIT Updated so that it considers order. Thanks @JakeStanger!
2nd EDIT Updated to reflect new condition where molecule ends with |
I used a regular expression to split by |
since from the given String
we know that a new molecule starts after |
. I used a Hashmap
to keep track of how many molecules of each type were given. In the end I iterated through each value in the Hashmap
and appended to String result
depending on whether it was one molecule or not. Cheers!
public static String factorise(String input) {
String result = "";
Map<String, Integer> molecules = new LinkedHashMap<>();
String[] res = input.split("\\|");
for (String t : res) {
//Check if we already have this element in our map
if (!molecules.containsKey(t)) {
//If not then add it and set the count to 1
molecules.put(t, 1);
} else {
//If we do then update the count by 1
molecules.put(t, molecules.get(t) + 1);
}
}
//Iterate through each molecule
for (String key : molecules.keySet()) {
if (molecules.get(key) == 1) {
//If the count is only at one, then we just need to append it.
result += key;
} else {
//Otherwise, we need the parentheces and the number of repetitions followed after
result = result + "(" + key + ")" + molecules.get(key);
}
}
return result;
}
Running
System.out.println(factorise("C6|H2|NO2|NO2|NO2|CH3|OH|OH"));
System.out.println(factorise("HO|HO"));
Yields the following when run:
run:
C6H2(NO2)3CH3(OH)2
(HO)2
BUILD SUCCESSFUL (total time: 0 seconds)
This answer gives the code: String result = source.replaceAll("(.+)\\1+", "$1")
, which replaces all repeated substrings.
I'm thinking that a slight modification of that code should do what you want. If you use "($1)"
as the replacement, it will wrap the match in parenthesis. You could probably step through the replacement and determine what number should come after the parenthesis.
To prevent the regex from capturing preceding numbers, try "([A-Za-z]+[1-9]*)\\1+"
.
This link explains how to count number of matches. It's a bit more complex:
Pattern pattern = Pattern.compile("([A-Za-z]+[1-9]*)\\1+");
Matcher matcher = pattern.matcher(YOUR_CHEM_STRING);
int count = 0;
String prior="";
while (matcher.find()){
if(m.group().equals(prior){
count++;
}else{
YOUR_CHEM_STRING.replaceAll("([A-Za-z]+[1-9]*)\\1+","($1)"+count);
count=0;
}
}
You count split up the formula elements into a list and then parse it, counting consecutive repeats. When the count is greater than 1 you will add it with parenthesis.
String formula = "C6H2NO2NO2NO2CH3";
Pattern p = Pattern.compile("[a-zA-Z]+[0-9]+");
Matcher m = p.matcher(formula);
List<String> parts = new ArrayList<String>();
while(m.find()) {
parts.add(m.group());
}
String shrink = "";
int count = 0;
for(int i=0; i<parts.size(); i++) {
count++;
if(i+1 == parts.size() || !parts.get(i+1).equals(parts.get(i))) {
if(count == 1)
shrink += parts.get(i);
else
shrink += "("+parts.get(i)+")"+count;
count = 0;
}
}
System.out.println(shrink); // result = "C6H2(NO2)3CH3"
If you are able to send the elements list, try this:
public static String shortForumla(List<List<Element>> elements) {
String shrink = "";
int count = 0;
for(int i=0; i<elements.size(); i++) {
String symbol = elements.get(i).get(0).symbol();
if(i+1 == elements.size() || !elements.get(i+1).get(0).symbol().equals(symbol)) {
if(count == 1)
shrink += symbol;
else
shrink += "("+symbol+")"+count;
count = 0;
}
}
return shrink;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With