I have a list patterns:
patterns=['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al',
'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn',
'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb',
'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In',
'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm',
'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta',
'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At',
'Rn']
and I have big dataframe with strings, for example:
str0='Mg0.97Fe0.03B2'
str1='Tl0.5Hg0.5Ba2Ca2Cu3O8'
I am trying this:
keyss=list(filter(None,regex.split("[^a-zA-Z]*",somestring)))
values=list(filter(None,regex.split("[^0-9.0-9]*",somestring)))
Sometimes, this works:
str3='Hg0.75SrBa2Ca2Cu3O8'
keyss=list(filter(None,regex.split("[^a-zA-Z]*",str3)))
values=list(filter(None,regex.split("[^0-9.0-9]*",str3))
['Ba', 'Fe', 'Co', 'Mn', 'As']
['1', '1.832', '0.15', '0.018', '2']
However, if I have a string like this:
str3='Hg0.75SrBa2Ca2Cu3O8'
keyss=list(filter(None,regex.split("[^a-zA-Z]*",str3)))
values=list(filter(None,regex.split("[^0-9.0-9]*",str3)))
['Hg', 'SrBa', 'Ca', 'Cu', 'O']!=['Hg', 'Sr','Ba', 'Ca', 'Cu', 'O']
['0.75', '2', '2', '3', '8']!=['0.75', '1','2', '2', '3', '8']
or this
str4='NbSn3'
keyss=list(filter(None,regex.split("[^a-zA-Z]*",str4)))
values=list(filter(None,regex.split("[^0-9.0-9]*",str4)))
['NbSn']!=['Nb','Sn']
['3']!=['1','3']
str4='Pb1.4Sr4Y1.2Ca0.8Cu4.6O'
...
My code is not working correctly. How I can fix it?
Each element is represented by its atomic symbol in the Periodic Table – e.g. H for hydrogen, Ca for calcium. If more than one atom of a particular element is present, then it's indicated by a number in subscript after the atomic symbol — for example, H2O means there are 2 atoms of hydrogen and one of oxygen.
The Parse Regex operator (also called the extract operator) enables users comfortable with regular expression syntax to extract more complex data from log lines. Parse regex can be used, for example, to extract nested fields.
I guess you started good with patterns
and then dropped the idea which is probably not helpful (you could use it in pyparsing
grammar) but there is indeed a simpler approach that follows your latter idea.
I suggest you do something like this:
str3='Hg0.75SrBa2Ca2Cu3O8'
splitted = list(regex.split("([A-Z][a-z]*)",str3))
keyss = list(filter(lambda a: a[0].isupper() if a else False, splitted))
values = list(filter(lambda a: a[0].isdigit() if a else False, splitted))
print(keyss, values)
['Hg', 'Sr', 'Ba', 'Ca', 'Cu', 'O'] ['0.75', '2', '2', '3', '8']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With