Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get pandas.DataFrame columns containing specific dtype

Tags:

python

pandas

I'm using df.columns.values to make a list of column names which I then iterate over and make charts, etc... but when I set this up I overlooked the non-numeric columns in the df. Now, I'd much rather not simply drop those columns from the df (or a copy of it). Instead, I would like to find a slick way to eliminate them from the list of column names.

Now I have:

names = df.columns.values 

what I'd like to get to is something that behaves like:

names = df.columns.values(column_type=float64) 

Is there any slick way to do this? I suppose I could make a copy of the df, and drop those non-numeric columns before doing columns.values, but that strikes me as clunky.

Welcome any inputs/suggestions. Thanks.

like image 482
Charlie_M Avatar asked Jul 23 '14 04:07

Charlie_M


People also ask

How do you select columns based on Dtype pandas?

Pandas Select columns based on their data typePandas dataframe has the function select_dtypes , which has an include parameter. Specify the datatype of the columns which you want select using this parameter. This can be useful to you if you want to select only specific data type columns from the dataframe.

How do I get certain columns from a data frame?

You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.

How do you filter categorical columns in pandas?

For categorical data you can use Pandas string functions to filter the data. The startswith() function returns rows where a given column contains values that start with a certain value, and endswith() which returns rows with values that end with a certain value.


2 Answers

Someone will give you a better answe than this possibly, but one thing I tend to do is if all my numeric data are int64 or float64 objects, then you can create a dict of the column data types and then use the values to create your list of columns.

So for example, in a dataframe where I have columns of type float64, int64 and object firstly you can look at the data types as so:

DF.dtypes

and if they conform to the standard whereby the non-numeric columns of data are all object types (as they are in my dataframes), then you can do the following to get a list of the numeric columns:

[key for key in dict(DF.dtypes) if dict(DF.dtypes)[key] in ['float64', 'int64']]

Its just a simple list comprehension. Nothing fancy. Again, though whether this works for you will depend upon how you set up you dataframe...

like image 149
Woody Pride Avatar answered Nov 09 '22 16:11

Woody Pride


dtypes is a Pandas Series. That means it contains index & values attributes. If you only need the column names:

headers = df.dtypes.index

it will return a list containing the column names of "df" dataframe.

like image 24
Arthur Zennig Avatar answered Nov 09 '22 16:11

Arthur Zennig