Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas: how to obtain the datatypes of objects in a mixed-datatype column?

Given a pandas.DataFrame with a column holding mixed datatypes, like e.g.

df = pd.DataFrame({'mixed': [pd.Timestamp('2020-10-04'), 999, 'a string']})

I was wondering how to obtain the datatypes of the individual objects in the column (Series)? Suppose I want to modify all entries in the Series that are of a certain type, like multiply all integers by some factor.

I could iteratively derive a mask and use it in loc, like

m = np.array([isinstance(v, int) for v in df['mixed']])

df.loc[m, 'mixed'] *= 10

# df
#                  mixed
# 0  2020-10-04 00:00:00
# 1                 9990
# 2             a string

That does the trick but I was wondering if there was a more pandastic way of doing this?

like image 255
FObersteiner Avatar asked Oct 04 '20 14:10

FObersteiner


People also ask

How do I print the data type of each column in Pandas?

Use Dataframe. dtypes to get Data types of columns in Dataframe. In Python's pandas module Dataframe class provides an attribute to get the data type information of each columns i.e. It returns a series object containing data type information of each column.

Can Pandas column have different data types?

Pandas uses other names for data types than Python, for example: object for textual data. A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype .

Can a Pandas series object hold data of different types?

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).

How do I pull data from a column in Pandas?

You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.


Video Answer


1 Answers

Still need call type

m = df.mixed.map(lambda x : type(x).__name__)=='int'
df.loc[m, 'mixed']*=10
df
                 mixed
0  2020-10-04 00:00:00
1                 9990
2             a string
like image 103
BENY Avatar answered Nov 03 '22 23:11

BENY