I have the next DataFrame:
df = pd.DataFrame({'a': [100, 3,4], 'b': [20.1, 2.3,45.3], 'c': [datetime.time(23,52), 30,1.00]})
and I would like to detect subtypes in columns without explicit programming a loop, if possible.
I am looking for the next output:
column a = [int]
column b = [float]
column c = [datetime.time, int, float]
You should appreciate that with Pandas you can have 2 broad types of series:
np.datetime64
and bool
.object
dtype: Used for series with mixed types or types which cannot be held natively in a NumPy array. The series is structured as a sequence of pointers to arbitrary Python objects and is generally inefficient.The reason for this preamble is you should only ever need to apply element-wise logic to the second type. Data in the first category is homogeneous by nature.
So you should separate your logic accordingly.
Use pd.DataFrame.dtypes
:
print(df.dtypes)
a int64
b float64
c object
dtype: object
object
dtypeIsolate these series via pd.DataFrame.select_dtypes
and then use a dictionary comprehension:
obj_types = {col: set(map(type, df[col])) for col in df.select_dtypes(include=[object])}
print(obj_types)
{'c': {int, datetime.time, float}}
You will need to do a little more work to get the exact format you require, but the above should be your plan of attack.
You can just use python built-in function map.
column_c = list(map(type,df['c']))
print(column_c)
output:
[datetime.time, int, float]
types = {i: set(map(type, df[i])) for i in df.columns}
# this will return unique dtypes of all columns in a dict
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With