Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TypeError: '<' not supported between instances of 'float' and 'str' when using shapiro test with scipy

I'm trying to run shapiro test for each column in pandas dataframe based on column "code".

This is how my df looks like:

>>>name  code   2020-10-22   2020-10-23   2020-10-24 ...
0  a      1      0.05423      0.1254      0.1432
1  b      1      0.57289      0.0092      0.2314
2  c      2      0.1205       0.0072      0.12
3  d      3      0.3234       0.231       0.231
...

I have 80 rows with 6 different codes (1,2,3,4,5,6).

I want to run the Shapiro test for each columns, for each code, for example, to take teh column of 2020-10-22, take only the rows with treatment no. 1 and run the shapiro test on them.

I have tried to do it using the following loop:

shapiros=[]

for variable in df.columns[2:]:
    tmp=df[['code',variable]]
    tmp=tmp[tmp[variable].notnull()]
    
    for i in tmp.code.unique().tolist():
        shapiro_test = stats.shapiro(tmp[tmp['code'] == i])
        shapiros.append(shapiro_test)

but then I get error :

---> 13         shapiro_test = stats.shapiro(tmp[tmp['code'] == i])

TypeError: '<' not supported between instances of 'float' and 'str'

I saw this error can occure due to having null values but I have gotten rid of this using the notnull(). I have checked teh notnull works by print the length of "tmp" in each iteration and it does change.

In addition, seems like the type of both is the same- object:

for variable in df.columns[2:]:
    tmp=df[['code',variable]]
    print(tmp.dtypes)
    tmp=tmp[tmp[variable].notnull()]
    
    for i in tmp.code.unique().tolist():
        print(type(i))


>>>code           object
2020-10-22    float64
dtype: object
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
... 

(it prints the same for all the days).

What can be the problem? how can I calculate the shapiro for each column for each code?

like image 741
Reut Avatar asked Sep 16 '25 14:09

Reut


1 Answers

You have to convert column Code to float/int to compare, as per your code, it currently is str. Try doing:

df['code'] = df['code'].astype(float)
like image 138
Elkoss Avatar answered Sep 18 '25 11:09

Elkoss