I am working with a data frame that has a structure something like the following:
In[75]: df.head(2)
Out[75]:
statusdata participant_id association latency response \
0 complete CLIENT-TEST-1476362617727 seeya 715 dislike
1 complete CLIENT-TEST-1476362617727 welome 800 like
stimuli elementdata statusmetadata demo$gender demo$question2 \
0 Sample B semi_imp complete male 23
1 Sample C semi_imp complete female 23
I want to be able to run a query string against the column demo$gender
.
I.e,
df.query("demo$gender=='male'")
But this has a problem with the $
sign. If I replace the $
sign with another delimited (like -
) then the problem persists. Can I fix up my query string to avoid this problem. I would prefer not to rename the columns as these correspond tightly with other parts of my application.
I really want to stick with a query string as it is supplied by another component of our tech stack and creating a parser would be a heavy lift for what seems like a simple problem.
Thanks in advance.
For the interested here is a simple proceedure I used to accomplish the task:
# Identify invalid column names
invalid_column_names = [x for x in list(df.columns.values) if not x.isidentifier() ]
# Make replacements in the query and keep track
# NOTE: This method fails if the frame has columns called REPL_0 etc.
replacements = dict()
for cn in invalid_column_names:
r = 'REPL_'+ str(invalid_column_names.index(cn))
query = query.replace(cn, r)
replacements[cn] = r
inv_replacements = {replacements[k] : k for k in replacements.keys()}
df = df.rename(columns=replacements) # Rename the columns
df = df.query(query) # Carry out query
df = df.rename(columns=inv_replacements)
Which amounts to identifying the invalid column names, transforming the query and renaming the columns. Finally we perform the query and then translate the column names back.
Credit to @chrisb for their answer that pointed me in the right direction
With the most recent version of pandas, you can esscape a column's name that contains special characters with a backtick (`)
df.query("`demo$gender` == 'male'")
Another possibility is clean the columns names as a previous step in your process, replacing special characters by some other more appropriate.
For instance:
(df
.rename(columns = lambda value: value.replace('$', '_'))
.query("demo_gender == 'male'")
)
The current implementation of query
requires the string to be a valid python expression, so column names must be valid python identifiers. Your two options are renaming the column, or using a plain boolean filter, like this:
df[df['demo$gender'] =='male']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With