I have a list patterns: <pre class="prettyprint lang-py prettyprint-override"><code>patterns=['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn'] </code></pre> and I have big dataframe with strings, for example: <pre class="prettyprint lang-py prettyprint-override"><code>str0='Mg0.97Fe0.03B2' str1='Tl0.5Hg0.5Ba2Ca2Cu3O8' </code></pre> I am trying this: <pre class="prettyprint lang-py prettyprint-override"><code>keyss=list(filter(None,regex.split("[^a-zA-Z]*",somestring))) values=list(filter(None,regex.split("[^0-9.0-9]*",somestring))) </code></pre> Sometimes, this works: <pre class="prettyprint lang-py prettyprint-override"><code>str3='Hg0.75SrBa2Ca2Cu3O8' keyss=list(filter(None,regex.split("[^a-zA-Z]*",str3))) values=list(filter(None,regex.split("[^0-9.0-9]*",str3)) ['Ba', 'Fe', 'Co', 'Mn', 'As'] ['1', '1.832', '0.15', '0.018', '2'] </code></pre> However, if I have a string like this: <pre class="prettyprint lang-py prettyprint-override"><code>str3='Hg0.75SrBa2Ca2Cu3O8' keyss=list(filter(None,regex.split("[^a-zA-Z]*",str3))) values=list(filter(None,regex.split("[^0-9.0-9]*",str3))) ['Hg', 'SrBa', 'Ca', 'Cu', 'O']!=['Hg', 'Sr','Ba', 'Ca', 'Cu', 'O'] ['0.75', '2', '2', '3', '8']!=['0.75', '1','2', '2', '3', '8'] </code></pre> or this <pre class="prettyprint lang-py prettyprint-override"><code>str4='NbSn3' keyss=list(filter(None,regex.split("[^a-zA-Z]*",str4))) values=list(filter(None,regex.split("[^0-9.0-9]*",str4))) ['NbSn']!=['Nb','Sn'] ['3']!=['1','3'] str4='Pb1.4Sr4Y1.2Ca0.8Cu4.6O' ... </code></pre> My code is not working correctly. How I can fix it?

I guess you started good with <code>patterns</code> and then dropped the idea which is probably not helpful (you could use it in <code>pyparsing</code> grammar) but there is indeed a simpler approach that follows your latter idea. I suggest you do something like this: <pre class="prettyprint"><code>str3='Hg0.75SrBa2Ca2Cu3O8' splitted = list(regex.split("([A-Z][a-z]*)",str3)) keyss = list(filter(lambda a: a[0].isupper() if a else False, splitted)) values = list(filter(lambda a: a[0].isdigit() if a else False, splitted)) print(keyss, values) </code></pre> <blockquote> ['Hg', 'Sr', 'Ba', 'Ca', 'Cu', 'O'] ['0.75', '2', '2', '3', '8'] </blockquote>

How do I parse a chemical formula using a regular expression?

Tags:

python

string

regex

I have a list patterns:

patterns=['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al',
       'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn',
       'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb',
       'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In',
       'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm',
       'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta',
       'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At',
       'Rn']

and I have big dataframe with strings, for example:

str0='Mg0.97Fe0.03B2'
str1='Tl0.5Hg0.5Ba2Ca2Cu3O8'

I am trying this:

keyss=list(filter(None,regex.split("[^a-zA-Z]*",somestring)))
values=list(filter(None,regex.split("[^0-9.0-9]*",somestring)))

Sometimes, this works:

str3='Hg0.75SrBa2Ca2Cu3O8'
keyss=list(filter(None,regex.split("[^a-zA-Z]*",str3)))
values=list(filter(None,regex.split("[^0-9.0-9]*",str3))
['Ba', 'Fe', 'Co', 'Mn', 'As']
['1', '1.832', '0.15', '0.018', '2']

However, if I have a string like this:

str3='Hg0.75SrBa2Ca2Cu3O8'
keyss=list(filter(None,regex.split("[^a-zA-Z]*",str3)))
values=list(filter(None,regex.split("[^0-9.0-9]*",str3)))
['Hg', 'SrBa', 'Ca', 'Cu', 'O']!=['Hg', 'Sr','Ba', 'Ca', 'Cu', 'O']
['0.75', '2', '2', '3', '8']!=['0.75', '1','2', '2', '3', '8']

or this

str4='NbSn3'
keyss=list(filter(None,regex.split("[^a-zA-Z]*",str4)))
values=list(filter(None,regex.split("[^0-9.0-9]*",str4)))
['NbSn']!=['Nb','Sn']
['3']!=['1','3']
str4='Pb1.4Sr4Y1.2Ca0.8Cu4.6O'
...

My code is not working correctly. How I can fix it?

237

asked Dec 07 '20 10:12

Oleg

Video Answer

1 Answers

I guess you started good with patterns and then dropped the idea which is probably not helpful (you could use it in pyparsing grammar) but there is indeed a simpler approach that follows your latter idea.

I suggest you do something like this:

str3='Hg0.75SrBa2Ca2Cu3O8'
splitted = list(regex.split("([A-Z][a-z]*)",str3))
keyss = list(filter(lambda a: a[0].isupper() if a else False, splitted))
values = list(filter(lambda a: a[0].isdigit() if a else False, splitted))
print(keyss, values)

['Hg', 'Sr', 'Ba', 'Ca', 'Cu', 'O'] ['0.75', '2', '2', '3', '8']

114

answered Sep 21 '22 14:09

sophros

Related questions
                            
                                Using type aliases in docstrings
                            
                                Sphinx: how to cross-reference a target generated by a custom directive
                            
                                Scrapy Pyinstaller OSError: could not get source code / twisted.internet.defer._DefGen_Return
                            
                                get index of concatenated range list
                            
                                any workaround to add token authorization decorator to endpoint at swagger python server stub
                            
                                How to retrieve the labels used in a segmentation mask in AWS Sagemaker
                            
                                How can I implement port forwarding in a Paramiko server?
                            
                                Set output matrix on numpy binomial function
                            
                                How to store result of an operation (like TOPK) per epoch in keras
                            
                                Django - GeoDjango read coordinates in the wrong order
                            
                                How to convert _io.TextIOWrapper to string?
                            
                                Python: Fastest way to perform millions of simple linear regression with 1 exogenous variable only
                            
                                NGINX + Flask, without Gunicorn?
                            
                                Get coordinates of quiver arrow (tip and bottom) when plotting in 'uv' mode
                            
                                How to Programmatically detect whether a file is a Python script
                            
                                How to fix /usr/local/bin/virtualenv: /usr/bin/python: bad interpreter: No such file or directory?
                            
                                Extended example to understand CUDA, Numba, Cupy, etc
                            
                                When/Where does PyPy produce machine code?
                            
                                error when using Mirrored strategy in Tensorflow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With