I am trying to filter the columns in a pandas dataframe based on whether they are of type date or not. I can figure out which ones are, but then would have to parse that output or manually select columns. I want to select date columns automatically. Here's what I have so far as an example - I'd want to only select the 'date_col' column in this case. <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame([['Feb-2017', 1, 2], ['Mar-2017', 1, 2], ['Apr-2017', 1, 2], ['May-2017', 1, 2]], columns=['date_str', 'col1', 'col2']) df['date_col'] = pd.to_datetime(df['date_str']) df.dtypes </code></pre> Out: <pre class="prettyprint"><code>date_str object col1 int64 col2 int64 date_col datetime64[ns] dtype: object </code></pre>

Pandas has a cool function called <code>select_dtypes</code>, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype <code>np.datetime64</code>. To filter by integers, you would use <code>[np.int64, np.int32, np.int16, np.int]</code>, for float: <code>[np.float32, np.float64, np.float16, np.float]</code>, to filter by numerical columns only: <code>[np.number]</code>. <pre class="prettyprint"><code>df.select_dtypes(include=[np.datetime64]) </code></pre> Out: <pre class="prettyprint"><code> date_col 0 2017-02-01 1 2017-03-01 2 2017-04-01 3 2017-05-01 </code></pre> In: <pre class="prettyprint"><code>df.select_dtypes(include=[np.number]) </code></pre> Out: <pre class="prettyprint"><code> col1 col2 0 1 2 1 1 2 2 1 2 3 1 2 </code></pre>

How do I tell if a column in a pandas dataframe is of type datetime? How do I tell if a column is numerical?

Tags:

python

pandas

dataframe

numpy

I am trying to filter the columns in a pandas dataframe based on whether they are of type date or not. I can figure out which ones are, but then would have to parse that output or manually select columns. I want to select date columns automatically. Here's what I have so far as an example - I'd want to only select the 'date_col' column in this case.

import pandas as pd
df = pd.DataFrame([['Feb-2017', 1, 2],
                   ['Mar-2017', 1, 2],
                   ['Apr-2017', 1, 2],
                   ['May-2017', 1, 2]], 
                  columns=['date_str', 'col1', 'col2'])
df['date_col'] = pd.to_datetime(df['date_str'])
df.dtypes

Out:

date_str            object
col1                 int64
col2                 int64
date_col    datetime64[ns]
dtype: object

608

asked Apr 04 '17 17:04

Charlie Haley

3 Answers

I just encountered this issue and found that @charlie-haley's answer isn't quite general enough for my use case. In particular np.datetime64 doesn't seem to match datetime64[ns, UTC].

df['date_col'] = pd.to_datetime(df['date_str'], utc=True)
print(df.date_str.dtype)  # datetime64[ns, UTC]

You could also extend the list of dtypes to include other types, but that doesn't seem like a good solution for future compatability, so I ended up using the is_datetime64_any_dtype function from the pandas api instead.

In:

from pandas.api.types import is_datetime64_any_dtype as is_datetime

df[[column for column in df.columns if is_datetime(df[column])]]

Out:

                   date_col
0 2017-02-01 00:00:00+00:00
1 2017-03-01 00:00:00+00:00
2 2017-04-01 00:00:00+00:00
3 2017-05-01 00:00:00+00:00

178

answered Oct 13 '22 22:10

jsignell

Pandas has a cool function called select_dtypes, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype np.datetime64. To filter by integers, you would use [np.int64, np.int32, np.int16, np.int], for float: [np.float32, np.float64, np.float16, np.float], to filter by numerical columns only: [np.number].

df.select_dtypes(include=[np.datetime64])

Out:

    date_col
0   2017-02-01
1   2017-03-01
2   2017-04-01
3   2017-05-01

In:

df.select_dtypes(include=[np.number])

Out:

    col1    col2
0   1       2
1   1       2
2   1       2
3   1       2

answered Oct 13 '22 22:10

Charlie Haley

bit uglier Numpy alternative:

In [102]: df.loc[:, [np.issubdtype(t, np.datetime64) for t in df.dtypes]]
Out[102]:
    date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01

In [103]: df.loc[:, [np.issubdtype(t, np.number) for t in df.dtypes]]
Out[103]:
   col1  col2
0     1     2
1     1     2
2     1     2
3     1     2

answered Oct 13 '22 22:10

MaxU - stop WAR against UA

Related questions
                            
                                Filter a pandas dataframe using values from a dict
                            
                                pytest run tests parallel
                            
                                Print only the message on warnings
                            
                                Error when checking target: expected dense_3 to have shape (3,) but got array with shape (1,)
                            
                                Python: urllib/urllib2/httplib confusion
                            
                                Asking "is hashable" about a Python value
                            
                                Can I create a "view" on a Python list?
                            
                                How to get the list of all initialized objects and function definitions alive in python?
                            
                                ValueError: object too deep for desired array while using convolution
                            
                                ImportError: Could not import the Python Imaging Library (PIL) required to load image files on tensorflow
                            
                                threading.Condition vs threading.Event
                            
                                Multi-line description of a parameter description in python docstring
                            
                                Draw a rectangle and a text in it using PIL
                            
                                plotting unix timestamps in matplotlib
                            
                                About the changing id of an immutable string
                            
                                Can I import Python's 3.6's formatted string literals (f-strings) into older 3.x, 2.x Python?
                            
                                What's the best way to initialise and use constants across Python classes?
                            
                                Create hourly/minutely time range using pandas
                            
                                Django model inheritance: create sub-instance of existing instance (downcast)?
                            
                                How can I allow django admin to set a field to NULL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I tell if a column in a pandas dataframe is of type datetime? How do I tell if a column is numerical?

Tags:

python

pandas

dataframe

numpy

Charlie Haley

People also ask

3 Answers

jsignell

Charlie Haley

MaxU - stop WAR against UA

Recent Activity

Donate For Us