I like dplyr
for data manipulation, but I don't understand how to use it for programming. For example, to rescale some variables, we could do:
mutate(cars, speed.scaled = scale(speed), dist.scaled = scale(dist))
Very cool. But now suppose I want to write a function that uses mutate
to scale all variables in a data frame. How do I create the ...
argument? The best thing I can come up with is something like:
fnargs <- lapply(names(cars), function(x){call("scale", as.name(x))})
names(fnargs) <- paste0(names(cars), ".scaled")
do.call(mutate, c(.data=as.name("cars"), fnargs))
Or is there an alternative interface that is more programming friendly?
Describe what the dplyr package in R is used for. Apply common dplyr functions to manipulate data in R. Employ the 'pipe' operator to link together a sequence of functions. Employ the 'mutate' function to apply other chosen functions to existing columns and create new columns of data.
dplyr aims to provide a function for each basic verb of data manipulation. These verbs can be organised into three categories based on the component of the dataset that they work with: Rows: filter() chooses rows based on column values.
This article will cover the five verbs of dplyr: select, filter, arrange, mutate, and summarize.
n() gives the current group size. cur_data() gives the current data for the current group (excluding grouping variables). cur_data_all() gives the current data for the current group (including grouping variables)
Easy peasy: use mutate_each(cars, funs(scale))
or apply(cars, 2, scale)
.
This can be done in base R like this:
cars.scaled <- as.data.frame(scale(cars))
or
cars.scaled <- replace(cars, TRUE, lapply(cars, scale))
or
cars.scaled <- cars
cars.scaled[] <- lapply(cars, scale)
The first one above can be translated to work with %>%
like this:
cars.scaled <- cars %>% scale %>% as.data.frame
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With