I am trying to filter the columns in a pandas dataframe based on whether they are of type date or not. I can figure out which ones are, but then would have to parse that output or manually select columns. I want to select date columns automatically. Here's what I have so far as an example - I'd want to only select the 'date_col' column in this case.
import pandas as pd
df = pd.DataFrame([['Feb-2017', 1, 2],
['Mar-2017', 1, 2],
['Apr-2017', 1, 2],
['May-2017', 1, 2]],
columns=['date_str', 'col1', 'col2'])
df['date_col'] = pd.to_datetime(df['date_str'])
df.dtypes
Out:
date_str object
col1 int64
col2 int64
date_col datetime64[ns]
dtype: object
To check for numeric columns, you could use df[c]. dtype. kind in 'iufcb' where c is any given column name. The comparison will yeild a True or False boolean output.
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
The col_is_integer() validation function, the expect_col_is_integer() expectation function, and the test_col_is_integer() test function all check whether one or more columns in a table is of the integer type.
Python String isnumeric() Method The isnumeric() method returns True if all the characters are numeric (0-9), otherwise False.
I just encountered this issue and found that @charlie-haley's answer isn't quite general enough for my use case. In particular np.datetime64
doesn't seem to match datetime64[ns, UTC]
.
df['date_col'] = pd.to_datetime(df['date_str'], utc=True)
print(df.date_str.dtype) # datetime64[ns, UTC]
You could also extend the list of dtypes to include other types, but that doesn't seem like a good solution for future compatability, so I ended up using the is_datetime64_any_dtype
function from the pandas api instead.
In:
from pandas.api.types import is_datetime64_any_dtype as is_datetime
df[[column for column in df.columns if is_datetime(df[column])]]
Out:
date_col
0 2017-02-01 00:00:00+00:00
1 2017-03-01 00:00:00+00:00
2 2017-04-01 00:00:00+00:00
3 2017-05-01 00:00:00+00:00
Pandas has a cool function called select_dtypes
, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype np.datetime64
. To filter by integers, you would use [np.int64, np.int32, np.int16, np.int]
, for float: [np.float32, np.float64, np.float16, np.float]
, to filter by numerical columns only: [np.number]
.
df.select_dtypes(include=[np.datetime64])
Out:
date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01
In:
df.select_dtypes(include=[np.number])
Out:
col1 col2
0 1 2
1 1 2
2 1 2
3 1 2
bit uglier Numpy alternative:
In [102]: df.loc[:, [np.issubdtype(t, np.datetime64) for t in df.dtypes]]
Out[102]:
date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01
In [103]: df.loc[:, [np.issubdtype(t, np.number) for t in df.dtypes]]
Out[103]:
col1 col2
0 1 2
1 1 2
2 1 2
3 1 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With