I noticed if I were to type df.column_name()
, I can autocomplete the column_name
with a tab in IPython notebook.
Now, the proper syntax for doing something to a column would be df['column_name']
, where I am unable to autocomplete (I am assuming because it is a string?). Is there any other notation or way to simplyfy typing out column names. I am essentailly looking for a solution that would allow me to tab autocomplete the column name within this df['column_name']
.
I've found the following method to be useful to me. It basically creates a namedtuple
containing the names of all the variables in the data frame as strings.
For example, consider the following data frame containing 2 variables called "variable_1" and "variable_2":
from collections import namedtuple
from pandas import DataFrame
import numpy as np
df = DataFrame({'variable_1':np.arange(5),'variable_2':np.arange(5)})
The following code creates a namedtuple called "var":
def ntuples():
list_of_names = df.columns.values
list_of_names_dict = {x:x for x in list_of_names}
Varnames = namedtuple('Varnames', list_of_names)
return Varnames(**list_of_names_dict)
var = ntuples()
In a notebook, when I write var.
and press Tab, the names of all the variables in the dataframe df
will be displayed. Writing var.variable_1
is equivalent to writing 'variable_1'. So the following would work: df[var.variable_1]
.
The reason I define a function to do it is that often times you will add new variables to a data frame. In order to update the new variables to your namedtuple "var" simply call the function again, ntuples()
, and you are good to go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With