Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting the number of elements in a molecular compound with Python (recursion if possible)?

So I'm trying to code something to tell me the number of elements in any given compound. I'm not even sure where to start: I tried coding something but then realized that it only worked for simple compounds (or did not work at all). Here's an example of what I want:

>>> function_input : 'NaMg3Al6(BO3)3Si6O18(OH)4', 'O'

>>> function_return : 31

I've come so far in my mess of a code (IT DOESN'T WORK, it just illustrates my rough thought process):

def get_pos_from_count(string: str, letter: str):
    count = string.count(letter)
    lens = [-1]
    for i in range(count):
        lens += [string[lens[i] + 1:].index(letter) + lens[i] + 1]
    return lens[1:]



def find_number(string, letter):
    if string.count(letter) == 0: return 0
    numbers = '1234567890'
    try:
        mul1 = int(string[0])
    except ValueError:
        mul1 = 0
    mul2 = []
    sub_ = 1
    list_of_positions = get_pos_from_count(string, letter)
    for i in list_of_positions:
        try:
            sub_ += int(string[i + 1]) if string[i + 1] in numbers else 0
        except IndexError: pass
        if string[i + 1:].count(')') > string[i + 1].count('('):
            try:
                mul2 += int(string[string[i + 1:].count(')') + 1])
            except (IndexError, ValueError): pass
    return mul1 * sub_ * mul2

The approach I was trying to implement was:

  1. Find the number of occurrences of said element in compound.
  2. Find each subscript, multiplying by subscript outside bracket if said element is in bracket.
  3. Sum up all subscripts, multiply by number of compounds (first character in string)
  4. Return said number to user

But then I realized my code would either be extremely long or require recursion, which I don't know how to apply here.

If possible, I'd like a semi-working function, but a quick tip on how to approach this is also helpful!

And I don't want to use external libraries if possible.

tl;dr: This question for atomicity of elements, without external libraries (if possible).

EDIT: Yes, the question I linked does have hints on how to do this, but when I tried to make any code work for only one element, and set it's weight to 1, I ran into a host of issues I don't know to solve.

like image 598
Codeman Avatar asked Nov 16 '25 14:11

Codeman


1 Answers

Let us divide the task into three parts:

  • Tokenize the string into a list of elements, numbers, and brackets;
  • Parse the bracket to have a nested list with sublists;
  • Count the elements in the nested list.

Introducing my tools:

  • Tokenize: more_itertools.split_when;
  • Parsing brackets: recursion;
  • Counting elements: collections.counter.

The code:

from more_itertools import split_when, pairwise
from itertools import chain
from collections import Counter

def nest_brackets(tokens, i = 0):
    l = []
    while i < len(tokens):
        if tokens[i] == ')':
            return i,l
        elif tokens[i] == '(':
            i,subl = nest_brackets(tokens, i+1)
            l.append(subl)
        else:
            l.append(tokens[i])
        i += 1
    return i,l

def parse_compound(s):
    tokens = [''.join(t) for t in split_when(s, lambda a,b: b.isupper() or b in '()' or (b.isdigit() and not a.isdigit()))]
    tokens = [(int(t) if t.isdigit() else t) for t in tokens]
    i, l = nest_brackets(tokens)
    assert(i == len(tokens)) # crash if unmatched ')'
    return l

def count_elems(parsed_compound):
    c = Counter()
    for a,b in pairwise(chain(parsed_compound, (1,))):
        if not isinstance(a, int):
            subcounter = count_elems(a) if isinstance(a, list) else {a: 1}
            n = b if isinstance(b, int) else 1
            for elem,k in subcounter.items():
                c[elem] += k * n
    return c

s = 'NaMg3Al6(B(CO2)3)3Si6O18(OH)4'

l = parse_compound(s)
print(l)
# ['Na', 'Mg', 3, 'Al', 6, ['B', ['C', 'O', 2], 3], 3, 'Si', 6, 'O', 18, ['O', 'H'], 4]


c = count_elems(l)
print(c)
# Counter({'O': 40, 'C': 9, 'Al': 6, 'Si': 6, 'Mg': 3, 'B': 3, 'Na': 1})

print(c['O'])
# 40
like image 60
Stef Avatar answered Nov 18 '25 03:11

Stef



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!