Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse and evaluate a math expression with Pandas Dataframe columns?

What I would like to do is to parse an expression such this one:

result = A + B + sqrt(B + 4)

Where A and B are columns of a dataframe. So I would have to parse the expresion like this in order to get the result:

new_col = df.B + 4
result = df.A + df.B + new_col.apply(sqrt)

Where df is the dataframe.

I have tried with re.sub but it would be good only to replace the column variables (not the functions) like this:

import re

def repl(match):
    inner_word = match.group(1)
    new_var = "df['{}']".format(inner_word)
    return new_var

eq = 'A + 3 / B'
new_eq = re.sub('([a-zA-Z_]+)', repl, eq)
result = eval(new_eq)

So, my questions are:

  • Is there a python library to do this? If not, how can I achieve this in a simple way?
  • Creating a recursive function could be the solution?
  • If I use the "reverse polish notation" could simplify the parsing?
  • Would I have to use the ast module?
like image 726
ChesuCR Avatar asked Nov 06 '17 11:11

ChesuCR


People also ask

How do I extract values from a DataFrame column?

Using DataFrame. value() property you can extract column values of pandas DataFrame based on another column. The value() property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. You can put [0] at the end to access the value.

How get values from column in pandas?

You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.

How do you extract values from a DataFrame in Python?

get_value() function is used to quickly retrieve the single value in the data frame at the passed column and index. The input to the function is the row label and the column label.


2 Answers

Pandas DataFrames do have an eval function. Using your example equation:

import pandas as pd
# create an example DataFrame to work with
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
# define equation
eq = 'A + 3 / B'
# actual computation
df.eval(eq)

# more complicated equation
eq = "A + B + sqrt(B + 4)"
df.eval(eq)

Warning

Keep in mind that eval allows to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.

like image 185
uuazed Avatar answered Oct 23 '22 11:10

uuazed


Following the example provided by @uuazed, a faster way would be using numexpr

import pandas as pd
import numpy as np
import numexpr as ne

df = pd.DataFrame(np.random.randn(int(1e6), 2), columns=['A', 'B'])
eq = "A + B + sqrt(B + 4)"
timeit df.eval(eq)
# 15.9 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
timeit A=df.A; B=df.B; ne.evaluate(eq)
# 6.24 ms ± 396 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

numexpr may also have more supported operations

like image 22
avelo Avatar answered Oct 23 '22 10:10

avelo