Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split string into number and text with pandas

The Setup

I have a pandas dataframe that contains a column 'iso' containing chemical isotope symbols, such as '4He', '16O', '197Au'. I want to label many (but not all) isotopes on a plot using the annotate() function in matplotlib. The label format should have the atomic mass in superscript. I can do this with the LaTeX style formatting:

axis.annotate('$^{4}$He', xy=(x, y), xycoords='data')

I could write dozens of annotate() statements like the one above for each isotope I want to label, but I'd rather automate.

The Question

How can I extract the isotope number and name from my iso column?

With those pieces extracted I can make the labels. Lets say we dump them into the variables Num and Sym. Now I can loop over my isotopes and do something like this:

for i in list_of_isotopes:
  (Num, Sym) = df[df.iso==i].iso.str.MISSING_STRING_METHOD(???)
  axis.annotate('$^{%s}$%s' %(Num, Sym), xy=(x[Num], y[Num]), xycoords='data')

Presumably, there is a pandas string methods that I can drop into the above. But I'm having trouble coming up with a solution. I've been trying split() and extract() with a few different patterns, but can't get the desired effect.

like image 897
Paul T. Avatar asked Jan 08 '23 16:01

Paul T.


1 Answers

This is my answer using split. The regexp used can be improved, I'm very bad at that sort of things :-)

(\d+) stands for the integers, and ([A-Za-z]+) stands for the strings.

df = pd.DataFrame({'iso': ['4He', '16O', '197Au']})
result = df['iso'].str.split('(\d+)([A-Za-z]+)', expand=True)
result = result.loc[:,[1,2]]
result.rename(columns={1:'x', 2:'y'}, inplace=True)
print(result)

Produces

     x   y
0    4  He
1   16   O
2  197  Au
like image 104
Romain Avatar answered Jan 18 '23 17:01

Romain