I want to get a list of the column headers from a Pandas DataFrame. The DataFrame will come from user input, so I won't know how many columns there will be or what they will be called.
For example, if I'm given a DataFrame like this:
>>> my_dataframe y gdp cap 0 1 2 5 1 2 3 9 2 8 7 2 3 3 4 7 4 6 7 7 5 4 8 3 6 8 2 8 7 9 9 10 8 6 6 4 9 10 10 7
I would get a list like this:
>>> header_list ['y', 'gdp', 'cap']
To get the column names in Pandas dataframe you can type <code>print(df. columns)</code> given that your dataframe is named “df”.
From the dataframe we select the column “Name” using a [] operator that returns a Series object and uses Series. Values to get a NumPy array from the series object. Next, we will use the function tolist() provided by NumPy array to convert it to a list.
You can get the values as a list by doing:
list(my_dataframe.columns.values)
Also you can simply use (as shown in Ed Chum's answer):
list(my_dataframe)
There is a built-in method which is the most performant:
my_dataframe.columns.values.tolist()
.columns
returns an Index
, .columns.values
returns an array and this has a helper function .tolist
to return a list.
If performance is not as important to you, Index
objects define a .tolist()
method that you can call directly:
my_dataframe.columns.tolist()
The difference in performance is obvious:
%timeit df.columns.tolist() 16.7 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) %timeit df.columns.values.tolist() 1.24 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
For those who hate typing, you can just call list
on df
, as so:
list(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With