I have a dataframe as follows. I need to do ANOVA on this between three conditions. The dataframe looks like:
data0 = pd.DataFrame({'Names': ['CTA15', 'CTA15', 'AC007', 'AC007', 'AC007','AC007'],
'value': [22, 22, 2, 2, 2,5],
'condition':['NON', 'NON', 'YES', 'YES', 'RE','RE']})
I need to do ANOVA test between YES and NON, NON and RE and YES and RE, conditions from conditions for Names. I know I could do it like this,
NON=df.query('condition =="NON"and Names=="CTA15"')
no=df.value
YES=df.query('condition =="YES"and Names=="CTA15"')
Y=YES.value
Then perform one way ANOVA as following,
from scipy import stats
f_val, p_val = stats.f_oneway(no, Y)
print ("One-way ANOVA P =", p_val )
But would be great if there is any elegant solution as my initial data frame is big and has many names and conditions to compare between
Consider the following sample DataFrame:
df = pd.DataFrame({'Names': np.random.randint(1, 10, 1000),
'value': np.random.randn(1000),
'condition': np.random.choice(['NON', 'YES', 'RE'], 1000)})
df.head()
Out:
Names condition value
0 4 RE 0.844120
1 4 NON -0.440285
2 5 YES 0.559497
3 4 RE 0.472425
4 9 YES 0.205906
The following groups the DataFrame by Names, and then passes each condition group to ANOVA:
import scipy.stats as ss
for name_group in df.groupby('Names'):
samples = [condition[1] for condition in name_group[1].groupby('condition')['value']]
f_val, p_val = ss.f_oneway(*samples)
print('Name: {}, F value: {:.3f}, p value: {:.3f}'.format(name_group[0], f_val, p_val))
Name: 1, F value: 0.138, p value: 0.871
Name: 2, F value: 1.458, p value: 0.237
Name: 3, F value: 0.742, p value: 0.479
Name: 4, F value: 2.718, p value: 0.071
Name: 5, F value: 0.255, p value: 0.776
Name: 6, F value: 1.731, p value: 0.182
Name: 7, F value: 0.269, p value: 0.764
Name: 8, F value: 0.474, p value: 0.624
Name: 9, F value: 1.226, p value: 0.297
For post-hoc tests, you can use statsmodels (as explained here):
from statsmodels.stats.multicomp import pairwise_tukeyhsd
for name, grouped_df in df.groupby('Names'):
print('Name {}'.format(name), pairwise_tukeyhsd(grouped_df['value'], grouped_df['condition']))
Name 1 Multiple Comparison of Means - Tukey HSD,FWER=0.05 ============================================ group1 group2 meandiff lower upper reject -------------------------------------------- NON RE 0.0086 -0.5129 0.5301 False NON YES 0.0084 -0.4817 0.4986 False RE YES -0.0002 -0.5217 0.5214 False -------------------------------------------- Name 2 Multiple Comparison of Means - Tukey HSD,FWER=0.05 ============================================ group1 group2 meandiff lower upper reject -------------------------------------------- NON RE -0.0089 -0.5299 0.5121 False NON YES 0.083 -0.4182 0.5842 False RE YES 0.0919 -0.4008 0.5846 False -------------------------------------------- Name 3 Multiple Comparison of Means - Tukey HSD,FWER=0.05 ============================================ group1 group2 meandiff lower upper reject -------------------------------------------- NON RE 0.2401 -0.3136 0.7938 False NON YES 0.2765 -0.2903 0.8432 False RE YES 0.0364 -0.5052 0.578 False -------------------------------------------- Name 4 Multiple Comparison of Means - Tukey HSD,FWER=0.05 ============================================ group1 group2 meandiff lower upper reject -------------------------------------------- NON RE 0.0894 -0.5825 0.7613 False NON YES -0.0437 -0.7418 0.6544 False RE YES -0.1331 -0.6949 0.4287 False -------------------------------------------- Name 5 Multiple Comparison of Means - Tukey HSD,FWER=0.05 ============================================ group1 group2 meandiff lower upper reject -------------------------------------------- NON RE -0.4264 -0.9495 0.0967 False NON YES 0.0439 -0.4264 0.5142 False RE YES 0.4703 -0.0155 0.9561 False -------------------------------------------- Name 6 Multiple Comparison of Means - Tukey HSD,FWER=0.05 ============================================ group1 group2 meandiff lower upper reject -------------------------------------------- NON RE 0.0649 -0.4971 0.627 False NON YES -0.406 -0.9405 0.1285 False RE YES -0.4709 -1.0136 0.0717 False -------------------------------------------- Name 7 Multiple Comparison of Means - Tukey HSD,FWER=0.05 ============================================ group1 group2 meandiff lower upper reject -------------------------------------------- NON RE 0.3111 -0.2766 0.8988 False NON YES -0.1664 -0.7314 0.3987 False RE YES -0.4774 -1.0688 0.114 False -------------------------------------------- Name 8 Multiple Comparison of Means - Tukey HSD,FWER=0.05 ============================================ group1 group2 meandiff lower upper reject -------------------------------------------- NON RE -0.0224 -0.668 0.6233 False NON YES 0.0119 -0.668 0.6918 False RE YES 0.0343 -0.6057 0.6742 False -------------------------------------------- Name 9 Multiple Comparison of Means - Tukey HSD,FWER=0.05 ============================================ group1 group2 meandiff lower upper reject -------------------------------------------- NON RE -0.2414 -0.7792 0.2963 False NON YES 0.0696 -0.5746 0.7138 False RE YES 0.311 -0.3129 0.935 False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With