Right now, I have the following data.frame which was created by original.df %.% group_by(Category) %.% tally() %.% arrange(desc(n))
.
DF <- structure(list(Category = c("E", "K", "M", "L", "I", "A",
"S", "G", "N", "Q"), n = c(163051, 127133, 106680, 64868, 49701,
47387, 47096, 45601, 40056, 36882)), .Names = c("Category",
"n"), row.names = c(NA, 10L), class = c("tbl_df", "tbl", "data.frame"
))
Category n
1 E 163051
2 K 127133
3 M 106680
4 L 64868
5 I 49701
6 A 47387
7 S 47096
8 G 45601
9 N 40056
10 Q 36882
I want to create an "Other" field from the bottom ranked Categories by n. i.e.
Category n
1 E 163051
2 K 127133
3 M 106680
4 L 64868
5 I 49701
6 Other 217022
Right now, I am doing
rbind(filter(DF, rank(rev(n)) <= 5),
summarise(filter(DF, rank(rev(n)) > 5), Category = "Other", n = sum(n)))
which collapses all categories not in the top 5 into the Other category.
But I'm curious whether there's a better way in dplyr
or some other existing package. By "better" I mean more succinct/readable. I'm also interested in methods with cleverer or more flexible ways to choose Other
.
Select the “Add column” button to see the “New field” option. This button will open the menu to create a new custom field. From here, select the type of custom field you would like to create. Give your field a name select ‘Create’ to create your new custom field.
The form will force an answer to the question. The form will accept/record the " Other " field, only if there is input in it. The form will not accept (let alone record) the answer when the " Other " field is selected but left blank .
Create Fields in PDF Forms You should click the "Add Text Field" in "Form" tab; toggle the cursor over the page in the PDF document so as to add an interactive field. Right-click the field to choose properties, then you can set its colors, style, appearance and actions etc.
How to Create Fields in Word. This information is only applicable to Word 2013 and above. Adding a Legacy Text Field in Microsoft Word: Step one: Under the "Developer" tab, go to the Word "Ribbon", click the icon for "Legacy Tools". Then, you will see a drop-down menu.
This is another approach, assuming that each category (of the top 5 at least) only occurs once:
df %.%
arrange(desc(n)) %.% #you could skip this step since you arranged the input df already according to your question
mutate(Category = ifelse(1:n() > 5, "Other", Category)) %.%
group_by(Category) %.%
summarize(n = sum(n))
# Category n
#1 E 163051
#2 I 49701
#3 K 127133
#4 L 64868
#5 M 106680
#6 Other 217022
Edit:
I just noticed that my output is not order by decreasing n
any more. After running the code again, I found out that the order is kept until after the group_by(Category)
but when I run the summarize
afterwards, the order is gone (or rather, it seems to be ordered by Category
). Is that supposed to be like that?
Here are three more ways:
m <- 5 #number of top results to show in final table (excl. "Other")
n <- m+1
#preserves the order (or better: reesatblishes it by index)
df <- arrange(df, desc(n)) %.% #this could be skipped if data already ordered
mutate(idx = 1:n(), Category = ifelse(idx > m, "Other", Category)) %.%
group_by(Category) %.%
summarize(n = sum(n), idx = first(idx)) %.%
arrange(idx) %.%
select(-idx)
#doesnt preserve the order (same result as in first dplyr solution, ordered by Category)
df[order(df$n, decreasing=T),] #this could be skipped if data already ordered
df[n:nrow(df),1] <- "Other"
df <- aggregate(n ~ Category, data = df, FUN = "sum")
#preserves the order (without extra index)
df[order(df$n, decreasing=T),] #this could be skipped if data already ordered
df[n:nrow(df),1] <- "Other"
df[n,2] <- sum(df$n[df$Category == "Other"])
df <- df[1:n,]
Different package/different syntax version:
library(data.table)
dt = as.data.table(DF)
dt[order(-n), # your data is already sorted, so this does nothing for it
if (.BY[[1]]) .SD else list("Other", sum(n)),
by = 1:nrow(dt) <= 5][, !"nrow", with = F]
# Category n
#1: E 163051
#2: K 127133
#3: M 106680
#4: L 64868
#5: I 49701
#6: Other 217022
This function modifies a column, replacing the infrequent entries with Other
, either by specifying a minimum frequency, or by specifying the resultant number of categories intended.
#' @title Group infrequent entries into 'Other category'
#' @description Useful when you want to constrain the number of unique values in a column.
#' @param .data Data containing variable.
#' @param var Variable containing infrequent entries, to be collapsed into "Other".
#' @param n Threshold for total number of categories above "Other".
#' @param count Threshold for total count of observations before "Other".
#' @param by Extra variables to group by when calculating \code{n} or \code{count}.
#' @param copy Should \code{.data} be copied? Currently only \code{TRUE} is supported.
#' @param other.category Value that infrequent entries are to be collapsed into. Defaults to \code{"Other"}.
#' @return \code{.data} but with \code{var} changed to be grouped into smaller categories.
#' @export
mutate_other <- function(.data, var, n = 5, count, by = NULL, copy = TRUE, other.category = "Other"){
stopifnot(is.data.table(.data),
is.character(other.category),
identical(length(other.category), 1L))
had.key <- haskey(.data)
if (!isTRUE(copy)){
stop("copy must be TRUE")
}
out <- copy(.data)
if (had.key){
orig_key <- key(out)
} else {
orig_key <- "_order"
out[, "_order" := 1:.N]
setkeyv(out, "_order")
}
if (is.character(.data[[var]])){
stopifnot(!("nvar" %in% names(.data)),
var %in% names(.data))
N <- .rank <- NULL
n_by_var <-
out %>%
.[, .N, keyby = c(var, by)] %>%
.[, .rank := rank(-N)]
out <- merge(out, n_by_var, by = c(var, by))
if (missing(count)){
out[, (var) := dplyr::if_else(.rank <= n, out[[var]], other.category)]
} else {
out[, (var) := dplyr::if_else(N >= count, out[[var]], other.category)]
}
out <-
out %>%
.[, N := NULL] %>%
.[, .rank := NULL]
setkeyv(out, orig_key)
if (!had.key){
out[, (orig_key) := NULL]
setkey(out, NULL)
}
out
} else {
warning("Attempted to use by = on a non-character vector. Aborting.")
return(.data)
}
}
https://github.com/HughParsonage/hutils/blob/master/R/mutate_other.R
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With