Factors ordered vs. levels

Tags:

r

Can someone explain what is the use of the "ordered" parameter in R?

R says:

ordered
logical flag to determine if the levels should be regarded as ordered (in the order given).

So if I have a factor called names and set ordered = TRUE

names<- factor(c("fred","bob","john"), ordered = TRUE)
names

Why does it print out:

[1] fred bob  john
Levels: bob < fred < john

which seems like it did not use the order I gave it. I started with fred it starts with Bob?

Also how is the ordered parameter used differently from using the "levels" parameters which will order the factors:

names<- factor(c("fred","bob","john"), levels= c("john","fred","bob"))
names

This returns

[1] fred bob  john
Levels: john fred bob

Which seems like it is following my ordering. So why do I need the "ordered" parameter?

I am confused as to how "ordered" and "levels" are used.

735

asked Apr 30 '14 19:04

1 Answers

I'll replace your vector of names by more intuitive factors for which order makes more sense:

heights <- c("low","medium","high")

heights1 <- factor(heights, ordered = TRUE)
heights1
# [1] low    medium high  
# Levels: high < low < medium

heights2 <- factor(heights) # ordered = FALSE by default
heights2
# [1] low    medium high  
# Levels: high low medium

The order of the levels might not be the one you expect, but when you don't set an explicit order levels are sorted alphabetically.

To set an explicit order we can do as follows:

heights1<- factor(heights, levels = heights, ordered = TRUE)
heights1
# [1] low    medium high  
# Levels: low < medium < high

heights2<- factor(heights, levels = heights)
heights2
# [1] low    medium high  
# Levels: low medium high

You might sometimes want to use factor(x, levels = unique(x)) as levels can't be duplicated, in this case levels will be sorted by their first appearance.

So now it's sorted on both sides, but wait, one is supposed to be "unordered". The vocabulary is misleading as sorting unordered factors is possible, and even useful if you want to tweak your layouts with ggplot2 for instance.

However, as mentioned by @joran and @thomas, statistical models will consider categorical variables differently depending on whether they are ordered or not.

The use of ordered factors that led me here however is in the use of max and min functions, in particular inside of aggregation functions.

See this question and the accepted answer where having factors defined as ordered is necessary : Aggregate with max and factors

We had this:

# > df1
#    id height
# 1   1    low          
# 2   1   high         
# 3   2 medium          
# 4   2    low          
# 5   3 medium          
# 6   3 medium          
# 7   4    low          
# 8   4    low          
# 9   5 medium          
# 10  5 medium

With unordered factors we couldn't aggregate:

# aggregate(height ~ id,df1,max)
# Error in Summary.factor(c(2L, 2L), na.rm = FALSE) : 
# ‘max’ not meaningful for factors

With ordered factors we can!

# aggregate(height ~ id,df1,max)
#   id height
# 1  1   high
# 2  2 medium
# 3  3 medium
# 4  4    low
# 5  5 medium

127

answered Oct 08 '22 19:10

Moody_Mudskipper

Related questions
                            
                                How to add abline with lattice xyplot function?
                            
                                Why doesn't lazy evaluation work in this R function? [duplicate]
                            
                                How to determine the geom type of each layer of a ggplot2 object?
                            
                                Root mean square error in R - mixed effect model
                            
                                How to change position of grid.draw
                            
                                re- installing R linux ubuntu: unmet dependencies R
                            
                                Replacing the "print" function in knitr chunk evaluation
                            
                                How to reference column names that start with a number, in data.table
                            
                                How can I change the color of the header in a xyplot?
                            
                                stacking columns into 1 column in R [duplicate]
                            
                                How to get number of rows for a specific value in a column
                            
                                Setting size of the rgl device
                            
                                How to remove a character in a variable of string type in R
                            
                                Is there maximum number of characters permissible in rownames or colnames in R?
                            
                                How to 'subset' a named vector in R?
                            
                                How to remove NA data in only one columns?
                            
                                Extracting time from character string with strptime() in R, returning NA
                            
                                R: bizarre behavior of set.seed()
                            
                                R shiny: multiple use in ui of same renderUI in server?
                            
                                geom_rect on some panels of a facet_wrap

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Factors ordered vs. levels

Tags:

r

user3022875

People also ask

1 Answers

Moody_Mudskipper

Recent Activity

Donate For Us