Rather than explicitly specifying the DataFrame columns in the code below, I'm trying to give an option of passing the name of the data frame in itself, without much success.
The code below gives a
"ValueError: Wrong number of dimensions" error.
I've tried another couple of ideas but they all lead to errors of one form or another.
Apart from this issue, when the parameters are passed as explicit DataFrame columns, p as a single column, and q as a list of columns, the code works as desired. Is there a clever (or indeed any) way of passing in the data frame so the columns can be assigned to it implicitly?
def cdf(p, q=[], datafr=None):
if datafr!=None:
p = datafr[p]
for i in range(len(q)):
q[i]=datafr[q[i]]
...
(calculate conditional probability tables for p|q)
to summarize:
current usage:
cdf(df['var1'], [df['var2'], df['var3']])
desired usage:
cdf('var1', ['var2', 'var3'], datafr=df)
Change if datafr != None: to if datafr is not None:
Pandas doesn't know which value in the dataframe you are trying to compare to None so it throws an error. is checks if both datafr and None are the pointing to the same object, which is a more stringent identity check. See this explanation.
Additional tips:
Python iterates over lists
#change this
for i in range(len(q)):
q[i]=datafr[q[i]]
#to this:
for i in q:
q[i] = datafr[q]
If q is a required parameter don't do q = [ ] when defining your function. If it is an optional parameter, ignore me.
Python can use position to match the arguments passed to the function call to with the parameters in the definition.
cdf('var1', ['var2', 'var3'], datafr=df)
#can be written as:
cdf('var1', ['var2', 'var3'], df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With