What I would like to do is to parse an expression such this one:
result = A + B + sqrt(B + 4)
Where A and B are columns of a dataframe. So I would have to parse the expresion like this in order to get the result:
new_col = df.B + 4
result = df.A + df.B + new_col.apply(sqrt)
Where df
is the dataframe.
I have tried with re.sub
but it would be good only to replace the column variables (not the functions) like this:
import re
def repl(match):
inner_word = match.group(1)
new_var = "df['{}']".format(inner_word)
return new_var
eq = 'A + 3 / B'
new_eq = re.sub('([a-zA-Z_]+)', repl, eq)
result = eval(new_eq)
So, my questions are:
ast
module?Using DataFrame. value() property you can extract column values of pandas DataFrame based on another column. The value() property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. You can put [0] at the end to access the value.
You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.
get_value() function is used to quickly retrieve the single value in the data frame at the passed column and index. The input to the function is the row label and the column label.
Pandas DataFrames do have an eval
function. Using your example equation:
import pandas as pd
# create an example DataFrame to work with
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
# define equation
eq = 'A + 3 / B'
# actual computation
df.eval(eq)
# more complicated equation
eq = "A + B + sqrt(B + 4)"
df.eval(eq)
Keep in mind that eval
allows to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.
Following the example provided by @uuazed, a faster way would be using numexpr
import pandas as pd
import numpy as np
import numexpr as ne
df = pd.DataFrame(np.random.randn(int(1e6), 2), columns=['A', 'B'])
eq = "A + B + sqrt(B + 4)"
timeit df.eval(eq)
# 15.9 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
timeit A=df.A; B=df.B; ne.evaluate(eq)
# 6.24 ms ± 396 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
numexpr may also have more supported operations
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With