I have a dataframe with spaces in column names. I am trying to use query
method to get the results. It is working fine with 'c' column but getting error for 'a b'
import pandas as pd a = pd.DataFrame(columns=["a b", "c"]) a["a b"] = [1,2,3,4] a["c"] = [5,6,7,8] a.query('a b==5')
For this I am getting this error:
a b ==5 ^ SyntaxError: invalid syntax
I don't want to fill up space with other characters like '_' etc.
There is one hack using pandasql to put variable name inside brackets example: [a b]
You can refer to column names that contain spaces or operators by surrounding them in backticks. This way you can also escape names that start with a digit, or those that are a Python keyword.
To strip whitespaces from column names, you can use str. strip, str. lstrip and str. rstrip.
Pandas rename function to Rename Columns And not all the column names need to be changed. To change column names using rename function in Pandas, one needs to specify a mapper, a dictionary with old name as keys and new name as values. Here is an example to change many column names using a dictionary.
From pandas 0.25
onward you will be able to escape column names with backticks so you can do
a.query('`a b` == 5')
As described here:
DataFrame.query()
andDataFrame.eval()
now supports quoting column names with backticks to refer to names with spaces (GH6508)
So you can use:
a.query('`a b`==5')
You cannot use pd.DataFrame.query
if you have whitespace in your column name. Consider what would happen if you had columns named a
, b
and a b
; there would be ambiguity as to what you require.
Instead, you can use pd.DataFrame.loc
:
df = df.loc[df['a b'] == 5]
Since you are only filtering rows, you can omit .loc
accessor altogether:
df = df[df['a b'] == 5]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With