I have a dataframe, for example:
1
1.3
2,5
4
5
With the following code, I am trying to know what are the types of the different cells of my pandas dataframe:
for i in range (len(data.columns)) :
print (" lenth of columns : " + str(len(data.columns)) )
for j in range (len(data[i])) :
data[i][j]=re.sub(r'(\d*)\.(\d*)',r'\1,\2',str(data[i][j]))
print(str(data[i][j]))
print(" est de type : "type(data[i][j]))
if str(data[i][j]).isdigit():
print(str(data[i][j]) + " contain a number " )
The problem is when a cell of the dataframe contain a dot, pandas thinks it is a string. So I used regex, in order to change the dot into a comma.
But after that, the types of all my dataframe cells changed to string. My question is: How can I know if a cell of the dataframe is an int or a float? I already tried isinstance(x, int)
edit : How can I count the number of int and float, with the output of the df.apply(type) for example , I want to know how many cells of my column are int or float
My second question is, why when I have 2.5 , the dataframe give him the str type ?
0 <class 'int'>
1 <class 'str'>
2 <class 'float'>
3 <class 'float'>
4 <class 'int'>
5 <class 'str'>
6 <class 'str'>
Thanks.
If you have a column with different types, e.g.
>>> df = pd.DataFrame(data = {"l": [1,"a", 10.43, [1,3,4]]})
>>> df
l
0 1
1 a
2 10.43
4 [1, 3, 4]
Pandas will just state that this Series
is of dtype object
. However, you can get each entry type by simply applying type
function
>>> df.l.apply(type)
0 <type 'int'>
1 <type 'str'>
2 <type 'float'>
4 <type 'list'>
However, if you have a dataset with very different data types, you probably should reconsider its design..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With