I have a data.frame similar to below. I pre-process it by deleting rows that I am not interested in. Most of my columns are 'factors', whose 'levels' are not updated as I filter the data.frame.
I can see that what I am doing below is not ideal. How do I get the factor levels update as I modify the data.frame? Below is a demonstration of what is going wrong.
# generate data
set.seed(2013)
df <- data.frame(site = sample(c("A","B","C"), 50, replace = TRUE),
currency = sample(c("USD", "EUR", "GBP", "CNY", "CHF"),50, replace=TRUE, prob=c(10,6,5,6,0.5)),
value = ceiling(rnorm(50)*10))
# check counts to see there is one entry where currency = CHF
count(df, vars="currency")
>currency freq
>1 CHF 1
>2 CNY 13
>3 EUR 16
>4 GBP 6
>5 USD 14
# filter out all entires where site = A, i.e. take subset of df
df <- df[!(df$site=="A"),]
# check counts again to see how this affected the currency frequencies
count(df, vars="currency")
>currency freq
>1 CNY 10
>2 EUR 8
>3 GBP 4
>4 USD 10
# But, the filtered data.frame's levels have not been updated:
levels(df$currency)
>[1] "CHF" "CNY" "EUR" "GBP" "USD"
levels(df$site)
>[1] "A" "B" "C"
desired outputs:
# levels(df$currency) = "CNY" "EUR" "GBP" "USD
# levels(df$site) = "B" "C"
How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .
The droplevels() function in R can be used to drop unused factor levels. This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame. where x is an object from which to drop unused factor levels.
Removing Levels from a Factor in R Programming – droplevels() Function. droplevels() function in R programming used to remove unused levels from a Factor. droplevels(x, exclude = if(anyNA(levels(x))) NULL else NA, …)
Subset a Data Frame with Base R Extract[] To specify a logical expression for the rows parameter, use the standard R operators. If subsetting is done by only rows or only columns, then leave the other value blank. For example, to subset the d data frame only by rows, the general form reduces to d[rows,] .
Use droplevels
:
> df <- droplevels(df)
> levels(df$currency)
[1] "CNY" "EUR" "GBP" "USD"
> levels(df$site)
[1] "B" "C"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With