How to interpret R linear regression when there are multiple factor levels as the baseline? [closed]

Tags:

My data has 3 independent variables, all of which are categorical:

condition: cond1, cond2, cond3

population: A,B,C

task: 1,2,3,4,5

The dependent variable is the task completion time. I run lm(time~condition+user+task,data) in R and get the following results:

enter image description here

What confuses me is that cond1, groupA, and task1 are left out from the results. From the thread linear regression "NA" estimate just for last coefficient, I understand that one factor level is chosen as the "baseline" and shown in the (Intercept) row.

But what if there are multiple factor levels used as the baseline, as in the above case?

Does the (Intercept) row now indicates cond1+groupA+task1?
What if I want to know the coefficient and significance for cond1, groupA, and task1 individually?
For example, groupB has an estimated coefficient +9.3349, compared to groupA? Or compared to cond1+groupA+task1?

513

asked Feb 10 '14 12:02

Ida

2 Answers

One person of your population must have one value for each variable 'condition', 'population' and 'task', so the baseline individual must have a value for each of this variables; in this case, cond1, A and t1. All of the results are based over the ideal (mean) individual with these independent variables, so the intercept do give the mean value of time for cond1, groupA and task1.

The significance or coefficient for cond1, groupA or task1 makes no sense, as significance means significant different mean value between one group and the reference group. You can not compare the reference group against itself.

As your model has no interactions, the coefficient for groupB means that the mean time for somebody in population B will be 9.33(seconds?) higher than the time for somebody in population A, regardless of the condition and task they are performing, and as the p-value is very small, you can stand that the mean time is in fact different between people in population B and people in the reference population (A). If you added an interaction term to the model, these terms (for example usergroupB:taskt4) would indicate the extra value added (or substracted) to the mean time if an individual has both conditions (in this example, if an individual is from population B and has performed task 4). These effects would be added to the marginal ones (usergroupB and taskt4).

Hope I helped.

121

answered Nov 14 '22 21:11

Rufo

Does the (Intercept) row now indicates cond1+groupA+task1?

Yes.

What if I want to know the coefficient and significance for cond1, groupA, and task1 individually?

Think about what significance means. You need to formulate a hypothesis. In your example everything is compared to the intercept and your question doesn't really make sense. However, you can always conduct pairwise comparisons between all possible effect combinations (see package multcomp).

For example, groupB has an estimated coefficient +9.3349, compared to groupA? Or compared to cond1+groupA+task1?

It's the difference between cond1/task1/groupA and cond1/task1/groupB. (As @Rufo correctly points out, it is of course an overall effect and actually the difference between groupB and groupA provided the other effects are equal.)

answered Nov 14 '22 23:11

Roland

Related questions
                            
                                Improve poor automatic tick position choices without explicitly specifying breaks
                            
                                Why is data.table casting column classes when I assign all columns by reference
                            
                                Asynchronous command dispatch in interactive R
                            
                                Efficiently merging two data frames on a non-trivial criteria
                            
                                Get environment identifier in R
                            
                                Running multiple, simple linear regressions from dataframe in R
                            
                                R Generate slides in slidify presentation in a loop
                            
                                how to perform log2 transform on specific columns in R [duplicate]
                            
                                using glmer for nested data
                            
                                rbindlist data.tables with different number of columns
                            
                                Finding overlapping ranges between two interval data
                            
                                Extract items from a list using get() and paste() in R
                            
                                Combine/Merge two ggplot aesthetics
                            
                                Maintain attributes of data frame columns after merge
                            
                                What are R levels?
                            
                                Is there an equivalent to R's negative indexing in Matlab?
                            
                                Getting predictions after rfImpute
                            
                                ggplot2: boxplot with all points distributed evenly in a row
                            
                                PDFs are missing images when compiling knitr .RNW examples
                            
                                Is there a way to prevent the download page from opening in R Shiny?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to interpret R linear regression when there are multiple factor levels as the baseline? [closed]

Tags:

r

statistics

linear-regression

Ida

People also ask

2 Answers

Rufo

Roland

Recent Activity

Donate For Us