Consider the following three DataFrame
's:
df1 = pd.DataFrame([[1,2],[4,3]])
df2 = pd.DataFrame([[1,.2],[4,3]])
df3 = pd.DataFrame([[1,'a'],[4,3]])
Here are the types of the second column of the DataFrame
's:
In [56]: map(type,df1[1])
Out[56]: [numpy.int64, numpy.int64]
In [57]: map(type,df2[1])
Out[57]: [numpy.float64, numpy.float64]
In [58]: map(type,df3[1])
Out[58]: [str, int]
In the first case, all int
's are casted to numpy.int64
. Fine. In the third case, there is basically no casting. However, in the second case, the integer (3
) is casted to numpy.float64
; probably since the other number is a float.
How can I control the casting? In the second case, I want to have either [float64, int64]
or [float, int]
as types.
Using a callable printing function there can be a workaround as showed here.
def printFloat(x):
if np.modf(x)[0] == 0:
return str(int(x))
else:
return str(x)
pd.options.display.float_format = printFloat
A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype .
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
In order to convert data types in pandas, there are three basic options: Use astype() to force an appropriate dtype. Create a custom function to convert the data. Use pandas functions such as to_numeric() or to_datetime()
The dtypes property is used to find the dtypes in the DataFrame. This returns a Series with the data type of each column. The result's index is the original DataFrame's columns. Columns with mixed types are stored with the object dtype.
The columns of a pandas DataFrame (or a Series) are homogeneously of type. You can inspect this with dtype
(or DataFrame.dtypes
):
In [14]: df1[1].dtype
Out[14]: dtype('int64')
In [15]: df2[1].dtype
Out[15]: dtype('float64')
In [16]: df3[1].dtype
Out[16]: dtype('O')
Only the generic 'object'
dtype can hold any python object, and in this way can also contain mixed types:
In [18]: df2 = pd.DataFrame([[1,.2],[4,3]], dtype='object')
In [19]: df2[1].dtype
Out[19]: dtype('O')
In [20]: map(type,df2[1])
Out[20]: [float, int]
But this is really not recommended, as this defeats the purpose (or at least the performance) of pandas.
Is there a reason you specifically want both ints and floats in the same column?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With