Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas query function not working with spaces in column names

I have a dataframe with spaces in column names. I am trying to use query method to get the results. It is working fine with 'c' column but getting error for 'a b'

import pandas as pd a = pd.DataFrame(columns=["a b", "c"]) a["a b"] = [1,2,3,4] a["c"] = [5,6,7,8] a.query('a b==5') 

For this I am getting this error:

a b ==5   ^ SyntaxError: invalid syntax 

I don't want to fill up space with other characters like '_' etc.

There is one hack using pandasql to put variable name inside brackets example: [a b]

like image 232
Bhushan Pant Avatar asked Jun 05 '18 10:06

Bhushan Pant


People also ask

How do you reference column names with spaces in pandas?

You can refer to column names that contain spaces or operators by surrounding them in backticks. This way you can also escape names that start with a digit, or those that are a Python keyword.

How do I remove spaces from a DataFrame column name?

To strip whitespaces from column names, you can use str. strip, str. lstrip and str. rstrip.

How do I fix column names in pandas?

Pandas rename function to Rename Columns And not all the column names need to be changed. To change column names using rename function in Pandas, one needs to specify a mapper, a dictionary with old name as keys and new name as values. Here is an example to change many column names using a dictionary.


2 Answers

From pandas 0.25 onward you will be able to escape column names with backticks so you can do

a.query('`a b` == 5')  
like image 173
Jarno Avatar answered Oct 01 '22 02:10

Jarno


Pandas 0.25+

As described here:

DataFrame.query() and DataFrame.eval() now supports quoting column names with backticks to refer to names with spaces (GH6508)

So you can use:

a.query('`a b`==5') 

Pandas pre-0.25

You cannot use pd.DataFrame.query if you have whitespace in your column name. Consider what would happen if you had columns named a, b and a b; there would be ambiguity as to what you require.

Instead, you can use pd.DataFrame.loc:

df = df.loc[df['a b'] == 5] 

Since you are only filtering rows, you can omit .loc accessor altogether:

df = df[df['a b'] == 5] 
like image 45
jpp Avatar answered Oct 01 '22 01:10

jpp