Lets suppose I create a dataframe with columns and query i.e
pd.DataFrame([[1,2],[3,4],[5,6]],columns=['a','b']).query('a>1')
This will give me
a b
1 3 4
2 5 6
But when dataframe values are too large and I don't have column names, how can I query a column by its index?
I tried querying by passing a number, but it's not the way of doing it.
pd.DataFrame([[1,2],[3,4],[5,6]]).query('0>1') # This is what I tried.
How to denote 0
is the column name in query?
Expected Output:
0 1
1 3 4
2 5 6
We can use select_if() function to get numeric columns by calling the function with the dataframe name and isnumeric() function that will check for numeric columns. where, dataframe is the input dataframe.
You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.
Python String isnumeric() Method The isnumeric() method returns True if all the characters are numeric (0-9), otherwise False.
Since the query is under development one possible solution is creating a monkey patch for pd.DataFrame
to evaluate self i.e :
def query_cols(self,expr):
if 'self' in expr:
return self[eval(expr)]
else:
return self.query(expr)
pd.DataFrame.query_cols = query_cols
pd.DataFrame([[1,2],[3,4],[5,6]]).query_cols('self[1] > 3')
0 1
1 3 4
2 5 6
pd.DataFrame([[1,2],[3,4],[5,6]]).query_cols('self[1] == 4')
0 1
1 3 4
pd.DataFrame([[1,2],[3,4],[5,6]],columns=['a','b']).query_cols('a > 3')
a b
2 5 6
This is a simple trick and doesn't suit all the cases, answer will be updated when the issue with query is resolved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With