I'm using df.columns.values to make a list of column names which I then iterate over and make charts, etc... but when I set this up I overlooked the non-numeric columns in the df. Now, I'd much rather not simply drop those columns from the df (or a copy of it). Instead, I would like to find a slick way to eliminate them from the list of column names.
Now I have:
names = df.columns.values
what I'd like to get to is something that behaves like:
names = df.columns.values(column_type=float64)
Is there any slick way to do this? I suppose I could make a copy of the df, and drop those non-numeric columns before doing columns.values, but that strikes me as clunky.
Welcome any inputs/suggestions. Thanks.
Pandas Select columns based on their data typePandas dataframe has the function select_dtypes , which has an include parameter. Specify the datatype of the columns which you want select using this parameter. This can be useful to you if you want to select only specific data type columns from the dataframe.
You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.
For categorical data you can use Pandas string functions to filter the data. The startswith() function returns rows where a given column contains values that start with a certain value, and endswith() which returns rows with values that end with a certain value.
Someone will give you a better answe than this possibly, but one thing I tend to do is if all my numeric data are int64
or float64
objects, then you can create a dict of the column data types and then use the values to create your list of columns.
So for example, in a dataframe where I have columns of type float64
, int64
and object
firstly you can look at the data types as so:
DF.dtypes
and if they conform to the standard whereby the non-numeric columns of data are all object
types (as they are in my dataframes), then you can do the following to get a list of the numeric columns:
[key for key in dict(DF.dtypes) if dict(DF.dtypes)[key] in ['float64', 'int64']]
Its just a simple list comprehension. Nothing fancy. Again, though whether this works for you will depend upon how you set up you dataframe...
dtypes is a Pandas Series. That means it contains index & values attributes. If you only need the column names:
headers = df.dtypes.index
it will return a list containing the column names of "df" dataframe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With