Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mixed types of elements in DataFrame's column

Consider the following three DataFrame's:

df1 = pd.DataFrame([[1,2],[4,3]])
df2 = pd.DataFrame([[1,.2],[4,3]])
df3 = pd.DataFrame([[1,'a'],[4,3]])

Here are the types of the second column of the DataFrame's:

In [56]: map(type,df1[1])
Out[56]: [numpy.int64, numpy.int64]

In [57]: map(type,df2[1])
Out[57]: [numpy.float64, numpy.float64]

In [58]: map(type,df3[1])
Out[58]: [str, int]

In the first case, all int's are casted to numpy.int64. Fine. In the third case, there is basically no casting. However, in the second case, the integer (3) is casted to numpy.float64; probably since the other number is a float.

How can I control the casting? In the second case, I want to have either [float64, int64] or [float, int] as types.

Workaround:

Using a callable printing function there can be a workaround as showed here.

def printFloat(x):
    if np.modf(x)[0] == 0:
        return str(int(x))
    else:
        return str(x)
pd.options.display.float_format = printFloat
like image 560
Dror Avatar asked Dec 08 '14 16:12

Dror


People also ask

Can a DataFrame column have different data types?

A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype .

How do I get Dtype of pandas column?

To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.

How do you use Dtype in pandas?

In order to convert data types in pandas, there are three basic options: Use astype() to force an appropriate dtype. Create a custom function to convert the data. Use pandas functions such as to_numeric() or to_datetime()

What is Dtype object in pandas?

The dtypes property is used to find the dtypes in the DataFrame. This returns a Series with the data type of each column. The result's index is the original DataFrame's columns. Columns with mixed types are stored with the object dtype.


1 Answers

The columns of a pandas DataFrame (or a Series) are homogeneously of type. You can inspect this with dtype (or DataFrame.dtypes):

In [14]: df1[1].dtype
Out[14]: dtype('int64')

In [15]: df2[1].dtype
Out[15]: dtype('float64')

In [16]: df3[1].dtype
Out[16]: dtype('O')

Only the generic 'object' dtype can hold any python object, and in this way can also contain mixed types:

In [18]: df2 = pd.DataFrame([[1,.2],[4,3]], dtype='object')

In [19]: df2[1].dtype
Out[19]: dtype('O')

In [20]: map(type,df2[1])
Out[20]: [float, int]

But this is really not recommended, as this defeats the purpose (or at least the performance) of pandas.

Is there a reason you specifically want both ints and floats in the same column?

like image 169
joris Avatar answered Oct 26 '22 12:10

joris