TypeError: '

Question

I'm trying to run shapiro test for each column in pandas dataframe based on column "code".

This is how my df looks like:

>>>name  code   2020-10-22   2020-10-23   2020-10-24 ...
0  a      1      0.05423      0.1254      0.1432
1  b      1      0.57289      0.0092      0.2314
2  c      2      0.1205       0.0072      0.12
3  d      3      0.3234       0.231       0.231
...

I have 80 rows with 6 different codes (1,2,3,4,5,6).

I want to run the Shapiro test for each columns, for each code, for example, to take teh column of 2020-10-22, take only the rows with treatment no. 1 and run the shapiro test on them.

I have tried to do it using the following loop:

shapiros=[]

for variable in df.columns[2:]:
    tmp=df[['code',variable]]
    tmp=tmp[tmp[variable].notnull()]
    
    for i in tmp.code.unique().tolist():
        shapiro_test = stats.shapiro(tmp[tmp['code'] == i])
        shapiros.append(shapiro_test)

but then I get error :

---> 13         shapiro_test = stats.shapiro(tmp[tmp['code'] == i])

TypeError: '<' not supported between instances of 'float' and 'str'

I saw this error can occure due to having null values but I have gotten rid of this using the notnull(). I have checked teh notnull works by print the length of "tmp" in each iteration and it does change.

In addition, seems like the type of both is the same- object:

for variable in df.columns[2:]:
    tmp=df[['code',variable]]
    print(tmp.dtypes)
    tmp=tmp[tmp[variable].notnull()]
    
    for i in tmp.code.unique().tolist():
        print(type(i))


>>>code           object
2020-10-22    float64
dtype: object
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
...

(it prints the same for all the days).

What can be the problem? how can I calculate the shapiro for each column for each code?

Elkoss · Accepted Answer

You have to convert column Code to float/int to compare, as per your code, it currently is str. Try doing:

df['code'] = df['code'].astype(float)

TypeError: '<' not supported between instances of 'float' and 'str' when using shapiro test with scipy

Tags:

python

for-loop

pandas

scipy

scipy.stats

Reut

1 Answers

Elkoss

Recent Activity

Donate For Us