Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating factor levels while filtering R data.frames [duplicate]

I have a data.frame similar to below. I pre-process it by deleting rows that I am not interested in. Most of my columns are 'factors', whose 'levels' are not updated as I filter the data.frame.

I can see that what I am doing below is not ideal. How do I get the factor levels update as I modify the data.frame? Below is a demonstration of what is going wrong.

# generate data
set.seed(2013)
df <- data.frame(site = sample(c("A","B","C"), 50, replace = TRUE),
                 currency = sample(c("USD", "EUR", "GBP", "CNY", "CHF"),50, replace=TRUE, prob=c(10,6,5,6,0.5)),
                 value = ceiling(rnorm(50)*10))

# check counts to see there is one entry where currency =  CHF
count(df, vars="currency")

>currency freq
>1      CHF    1
>2      CNY   13
>3      EUR   16
>4      GBP    6
>5      USD   14


# filter out all entires where site = A, i.e. take subset of df
df <- df[!(df$site=="A"),]

# check counts again to see how this affected the currency frequencies
count(df, vars="currency")

>currency freq
>1      CNY   10
>2      EUR    8
>3      GBP    4
>4      USD   10

# But, the filtered data.frame's levels have not been updated:
levels(df$currency)

>[1] "CHF" "CNY" "EUR" "GBP" "USD"

levels(df$site)

>[1] "A" "B" "C"

desired outputs:

# levels(df$currency) = "CNY" "EUR" "GBP" "USD
# levels(df$site) = "B" "C"
like image 367
Zhubarb Avatar asked Dec 10 '13 16:12

Zhubarb


People also ask

How do you change factor levels in R?

How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .

How do you get rid of a level in a factor in R?

The droplevels() function in R can be used to drop unused factor levels. This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame. where x is an object from which to drop unused factor levels.

How do you reduce factors in R?

Removing Levels from a Factor in R Programming – droplevels() Function. droplevels() function in R programming used to remove unused levels from a Factor. droplevels(x, exclude = if(anyNA(levels(x))) NULL else NA, …)

How do I subset a Dataframe in R?

Subset a Data Frame with Base R Extract[] To specify a logical expression for the rows parameter, use the standard R operators. If subsetting is done by only rows or only columns, then leave the other value blank. For example, to subset the d data frame only by rows, the general form reduces to d[rows,] .


1 Answers

Use droplevels:

> df <- droplevels(df)
> levels(df$currency)
[1] "CNY" "EUR" "GBP" "USD"
> levels(df$site)
[1] "B" "C"
like image 165
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 28 '22 03:11

A5C1D2H2I1M1N2O1R2T1