Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UndefinedVariableError when querying pandas DataFrame

Tags:

python

pandas

I am attempting to create a graph by querying values in a pandas DataFrame.

In this line:

data1 = [np.array(df.query('type == i')['continuous']
         for i in ('Type1', 'Type2', 'Type3', 'Type4')]

I get the error:

UndefinedVariableError: name 'i' is not defined

What am I missing?

like image 480
lbug Avatar asked Mar 16 '15 19:03

lbug


2 Answers

The i in your query expression

df.query('type == i')

is literally just the string 'i'. Since there are no extra enclosing quotes around it, pandas interprets it as the name of another column in your DataFrame, i.e. it looks for cases where

df['type'] == df['i']

Since there is no i column, you get an UndefinedVariableError.

It looks like you intended to query where the values in the type column are equal to the string variable named i, i.e. where

df['type'] == 'Type1'
df['type'] == 'Type2' # etc.

In this case you need to actually insert the string i into the query expression:

df.query('type == "%s"' % i)

The extra set of quotes are necessary if 'Type1', 'Type2' etc. are values within the type column, but not if they are the names of other columns in the dataframe.

like image 189
ali_m Avatar answered Nov 08 '22 00:11

ali_m


I know too late but maybe it helps somebody - use double quotes for i data1 = [np.array(df.query('type == "i"')['continuous']

like image 2
Oleh Pryshliak Avatar answered Nov 07 '22 23:11

Oleh Pryshliak