My data has 3 independent variables, all of which are categorical:
condition: cond1, cond2, cond3
population: A,B,C
task: 1,2,3,4,5
The dependent variable is the task completion time. I run lm(time~condition+user+task,data)
in R and get the following results:
What confuses me is that cond1, groupA, and task1 are left out from the results. From the thread linear regression "NA" estimate just for last coefficient, I understand that one factor level is chosen as the "baseline" and shown in the (Intercept) row.
But what if there are multiple factor levels used as the baseline, as in the above case?
Generally R2 is the measure presented. A good R depends on many factors. If I am running standards on a GC-MS I should expect an R2 of almost 1.0. A value of 0.8 will probably result in unpublishable research.
The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. A p-value less than 0.05 (typically ≤ 0.05) is statistically significant.
One person of your population must have one value for each variable 'condition', 'population' and 'task', so the baseline individual must have a value for each of this variables; in this case, cond1, A and t1. All of the results are based over the ideal (mean) individual with these independent variables, so the intercept do give the mean value of time for cond1, groupA and task1.
The significance or coefficient for cond1, groupA or task1 makes no sense, as significance means significant different mean value between one group and the reference group. You can not compare the reference group against itself.
As your model has no interactions, the coefficient for groupB means that the mean time for somebody in population B will be 9.33(seconds?) higher than the time for somebody in population A, regardless of the condition and task they are performing, and as the p-value is very small, you can stand that the mean time is in fact different between people in population B and people in the reference population (A). If you added an interaction term to the model, these terms (for example usergroupB:taskt4
) would indicate the extra value added (or substracted) to the mean time if an individual has both conditions (in this example, if an individual is from population B and has performed task 4). These effects would be added to the marginal ones (usergroupB
and taskt4
).
Hope I helped.
Does the (Intercept) row now indicates cond1+groupA+task1?
Yes.
What if I want to know the coefficient and significance for cond1, groupA, and task1 individually?
Think about what significance means. You need to formulate a hypothesis. In your example everything is compared to the intercept and your question doesn't really make sense. However, you can always conduct pairwise comparisons between all possible effect combinations (see package multcomp).
For example, groupB has an estimated coefficient +9.3349, compared to groupA? Or compared to cond1+groupA+task1?
It's the difference between cond1/task1/groupA and cond1/task1/groupB. (As @Rufo correctly points out, it is of course an overall effect and actually the difference between groupB and groupA provided the other effects are equal.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With