Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are levels in R?

Tags:

r

levels

I understand this is a very basic question but I don't understand what levels mean in R.

For reference, I have done a simple script to read CSV table, filter on one of the fields, pass this on to a new variable and clear the memory allocated for the first variable. If I call unique() on the field on which I filtered, I see that the results were indeed filtered but there is one additional line showing 'Levels' corresponding to data that is in the original dataset.

Example:

df = read.csv(path, sep=",", header=TRUE)
df_intrate = df[df$AssetClass == "ASSET CLASS A", ]

rm(df)
gc()

unique(df_intrate$AssetClass)

Results:

[1] ASSET CLASS A
Levels: ASSET CLASS E ASSET CLASS D ASSET CLASS C ASSET CLASS B ASSET CLASS A

Is the structural information from df somehow preserved in df_intrate despite R studio showing that df_intrate is indeed the expected number of rows for ASSET CLASS A ?

like image 658
ApplePie Avatar asked Oct 19 '17 13:10

ApplePie


People also ask

What are factor levels?

Factor levels are all of the values that the factor can take (recall that a categorical variable has a set number of groups). In a designed experiment, the treatments represent each combination of factor levels. If there is only one factor with k levels, then there would be k treatments.

How do you find the level of a variable in R?

We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.

What is the purpose of the levels argument to the factor function?

Since unique returns unique values in the order they are encountered, the levels argument will provide the month abbreviations in the correct order to produce an properly ordered factor.

How do you find the level of a column in R?

To extract the factor levels from factor column, we can simply use levels function. For example, if we have a data frame called df that contains a factor column defined with x then the levels of factor levels in x can be extracted by using the command levels(df$x).


1 Answers

Is the structural information from df somehow preserved in df_intrate despite R studio showing that df_intrate is indeed the expected number of rows for ASSET CLASS A ?

Yes. This is how categorical variables, called factors, are stored in R - both the levels, a vector of all possible values, and the actual values taken, are stored:

x = factor(c('a', 'b', 'c', 'a', 'b', 'b'))
x
# [1] a b c a b b
# Levels: a b c

y = x[1]
# [1] a
# Levels: a b c

You can get rid of unused levels with droplevels(), or by re-applying the factor function, creating a new factor out of only what is present:

droplevels(y)
# [1] a
# Levels: a

factor(y)
# [1] a
# Levels: a

You can also use droplevels on a data frame to drop all unused levels from all factor columns:

dat = data.frame(x = x)
str(dat)
# 'data.frame': 6 obs. of  1 variable:
#  $ x: Factor w/ 3 levels "a","b","c": 1 2 3 1 2 2

str(dat[1, ])
# Factor w/ 3 levels "a","b","c": 1

str(droplevels(dat[1, ]))
# Factor w/ 1 level "a": 1

Though unrelated to your current issue, we should also mention that factor has an optional levels argument which can be used to specify the levels of a factor and the order in which they should go. This can be useful if you want a specific order (perhaps for plotting or modeling), or if there are more possible levels than are actually present and you want to include them. If you don't specify the levels, the default will be alphabetical order.

x = c("agree", "disagree", "agree", "neutral", "strongly agree")
factor(x)
# [1] agree         disagree      agree         neutral       strongly agree
# Levels: agree disagree neutral strongly agree
## not a good order

factor(x, levels = c("disagree", "neutral", "agree", "strongly agree"))
# [1] agree          disagree       agree          neutral        strongly agree
# Levels: disagree neutral agree strongly agree
## better order

factor(x, levels = c("strongly disagree", "disagree", "neutral", "agree", "strongly agree"))
# [1] agree          disagree       agree          neutral        strongly agree
# Levels: strongly disagree disagree neutral agree strongly agree
## good order, more levels than are actually present

You can use ?reorder and ?relevel (or just factor again) to change the order of levels for an already created factor.

like image 57
Gregor Thomas Avatar answered Oct 04 '22 02:10

Gregor Thomas