Can someone explain what is the use of the "ordered" parameter in R?
R says:
ordered
logical flag to determine if the levels should be regarded as ordered (in the order given).
So if I have a factor called names and set ordered = TRUE
names<- factor(c("fred","bob","john"), ordered = TRUE)
names
Why does it print out:
[1] fred bob john
Levels: bob < fred < john
which seems like it did not use the order I gave it. I started with fred it starts with Bob?
Also how is the ordered parameter used differently from using the "levels" parameters which will order the factors:
names<- factor(c("fred","bob","john"), levels= c("john","fred","bob"))
names
This returns
[1] fred bob john
Levels: john fred bob
Which seems like it is following my ordering. So why do I need the "ordered" parameter?
I am confused as to how "ordered" and "levels" are used.
Factor is another way of referring to a categorical variable. Factor levels are all of the values that the factor can take (recall that a categorical variable has a set number of groups). In a designed experiment, the treatments represent each combination of factor levels.
Ordered factors is an extension of factors. It arranges the levels in increasing order. We use two functions: factor() along with argument ordered().
A factor of an experiment is a controlled independent variable; a variable whose levels are set by the experimenter. A factor is a general type or category of treatments. Different treatments constitute different levels of a factor.
The ordered() function creates such ordered factors but is otherwise identical to factor . For most purposes the only difference between ordered and unordered factors is that the former are printed showing the ordering of the levels, but the contrasts generated for them in fitting linear models are different.
I'll replace your vector of names by more intuitive factors for which order makes more sense:
heights <- c("low","medium","high")
heights1 <- factor(heights, ordered = TRUE)
heights1
# [1] low medium high
# Levels: high < low < medium
heights2 <- factor(heights) # ordered = FALSE by default
heights2
# [1] low medium high
# Levels: high low medium
The order of the levels might not be the one you expect, but when you don't set an explicit order levels are sorted alphabetically.
To set an explicit order we can do as follows:
heights1<- factor(heights, levels = heights, ordered = TRUE)
heights1
# [1] low medium high
# Levels: low < medium < high
heights2<- factor(heights, levels = heights)
heights2
# [1] low medium high
# Levels: low medium high
You might sometimes want to use factor(x, levels = unique(x))
as levels can't be duplicated, in this case levels will be sorted by their first appearance.
So now it's sorted on both sides, but wait, one is supposed to be "unordered".
The vocabulary is misleading as sorting unordered factors is possible, and even useful if you want to tweak your layouts with ggplot2
for instance.
However, as mentioned by @joran and @thomas, statistical models will consider categorical variables differently depending on whether they are ordered or not.
The use of ordered factors that led me here however is in the use of max
and min
functions, in particular inside of aggregation functions.
See this question and the accepted answer where having factors defined as ordered is necessary : Aggregate with max and factors
We had this:
# > df1
# id height
# 1 1 low
# 2 1 high
# 3 2 medium
# 4 2 low
# 5 3 medium
# 6 3 medium
# 7 4 low
# 8 4 low
# 9 5 medium
# 10 5 medium
With unordered factors we couldn't aggregate:
# aggregate(height ~ id,df1,max)
# Error in Summary.factor(c(2L, 2L), na.rm = FALSE) :
# ‘max’ not meaningful for factors
With ordered factors we can!
# aggregate(height ~ id,df1,max)
# id height
# 1 1 high
# 2 2 medium
# 3 3 medium
# 4 4 low
# 5 5 medium
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With