Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store formulas, instead of values, in pandas DataFrame

Tags:

python

pandas

Is it possible to work with pandas DataFrame as with an Excel spreadsheet: say, by entering a formula in a column so that when variables in other columns change, the values in this column change automatically? Something like:

a  b  c
2  3  =a+b

And so when I update 2 or 3, the column c also updates automatically.

PS: It's clearly possible to write a function to return a+b, but is there any built-in functionality in pandas or in other Python libraries to work with matrices this way?

like image 224
Anton Tarasenko Avatar asked Aug 02 '13 19:08

Anton Tarasenko


3 Answers

This will work in 0.13 (still in development)

In [19]: df = DataFrame(randn(10,2),columns=list('ab'))

In [20]: df
Out[20]: 
          a         b
0  0.958465  0.679193
1 -0.769077  0.497436
2  0.598059  0.457555
3  0.290926 -1.617927
4 -0.248910 -0.947835
5 -1.352096 -0.568631
6  0.009125  0.711511
7 -0.993082 -1.440405
8 -0.593704  0.352468
9  0.523332 -1.544849

This will be possible as 'a + b' (soon)

In [21]: formulas = { 'c' : 'df.a + df.b' }

In [22]: def update(df,formulas):
               for k, v in formulas.items():
                  df[k] = pd.eval(v)


In [23]: update(df,formulas)

In [24]: df
Out[24]: 
          a         b         c
0  0.958465  0.679193  1.637658
1 -0.769077  0.497436 -0.271642
2  0.598059  0.457555  1.055614
3  0.290926 -1.617927 -1.327001
4 -0.248910 -0.947835 -1.196745
5 -1.352096 -0.568631 -1.920726
6  0.009125  0.711511  0.720636
7 -0.993082 -1.440405 -2.433487
8 -0.593704  0.352468 -0.241236
9  0.523332 -1.544849 -1.021517

You could implement a hook into setitem on the data frame to have this type of function called automatically. But pretty tricky. You didn't specify how the frame is updated in the first place. Would probably be easiest to simply call the update function after you change the values

like image 151
Jeff Avatar answered Nov 06 '22 15:11

Jeff


I don't know it it is what you want, but I accidentally discovered that you can store xlwt.Formula objects in the DataFrame cells, and then, using DataFrame.to_excel method, export the DataFrame to excel and have your formulas in it:

import pandas
import xlwt

formulae=[]
formulae.append(xlwt.Formula('SUM(F1:F5)'))
formulae.append(xlwt.Formula('SUM(G1:G5)'))
formulae.append(xlwt.Formula('SUM(H1:I5)'))
formulae.append(xlwt.Formula('SUM(I1:I5)'))

df=pandas.DataFrame(formula)
df.to_excel('FormulaTest.xls')

Try it...

like image 21
jtornero Avatar answered Nov 06 '22 16:11

jtornero


There's currently no way to do this exactly in the way that you describe.

In pandas 0.13 there will be a new DataFrame.eval method that will allow you to evaluate an expression in the "context" of a DataFrame. For example, you'll be able to df['c'] = df.eval('a + b').

like image 34
Phillip Cloud Avatar answered Nov 06 '22 16:11

Phillip Cloud