Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't I get the type of a Pandas cell from a function?

Tags:

python

pandas

I'd like to get the type of argument I'm passing to a function (I think it's a Pandas Series, but I want to make sure) and write into a new column in a Pandas Dataframe. Why does this

data = np.array([['','Col1','Col2', 'Col3'],
                ['Row1','cd, d', '1, 2', 'ab; cd'],
                ['Row2','e, f', '5, 6', 'ef; gh'],
                ['Row3','a, b', '3, 4', 'ij; kl']])

df = pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:])

def find_type(my_arg):
    return type(my_arg)

df['types'] = find_type(df['Col1'])

give me

AttributeError: 'int' object has no attribute 'index'

and what's the right way to do this?

like image 564
Zubo Avatar asked Dec 24 '22 11:12

Zubo


2 Answers

In case this helps, the columns of a dataframe (which are series) have a dtype like float64, int32, or object where object is basically a catchall for non-numbers like strings.

Beyond that, the cells can further have types. If the dtype is some sort of int or float, then the cells will also be ints or floats. If the dtype is object, then the cells can be anything, including a mix of types.

Here's an example:

>>> df=pd.DataFrame({'a':[1.1,2.2],'b':[1,2],
                     'c':['cat','dog'],'d':['rat',3]})

>>> df.dtypes

a    float64
b      int64
c     object
d     object
dtype: object

>>> df.applymap(type)

                 a              b              c              d
0  <class 'float'>  <class 'int'>  <class 'str'>  <class 'str'>
1  <class 'float'>  <class 'int'>  <class 'str'>  <class 'int'>

I'm not sure if this is helpful or what you're trying to do but I couldn't find a simple explanation of this to link to so figured I'd write this up quickly.

like image 173
JohnE Avatar answered Dec 28 '22 05:12

JohnE


You're looking for pandas.DataFrame.dtypes.

>>> df.dtypes
Col1    object
Col2    object
Col3    object
dtype: object

>>> dict(df.dtypes)
{'Col1': dtype('O'), 'Col2': dtype('O'), 'Col3': dtype('O')}

>>> df['Col1'].dtypes
dtype('O')

If you do type(df['Col1']), Python will tell you that the type is pandas.core.series.Series which isn't particularly useful. You need to determine the type of data stored in the column, not that the column is implemented as a series.

like image 33
RagingRoosevelt Avatar answered Dec 28 '22 07:12

RagingRoosevelt