Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the dplyr period character "." reference?

What does the period . reference in the following dplyr code?:

(df <- as.data.frame(matrix(rep(1:5, 5), ncol=5))) #    V1 V2 V3 V4 V5 #  1  1  1  1  1  1 #  2  2  2  2  2  2 #  3  3  3  3  3  3 #  4  4  4  4  4  4 #  5  5  5  5  5  5  dplyr::mutate_each(df, funs(. == 5)) #       V1    V2    V3    V4    V5 #  1 FALSE FALSE FALSE FALSE FALSE #  2 FALSE FALSE FALSE FALSE FALSE #  3 FALSE FALSE FALSE FALSE FALSE #  4 FALSE FALSE FALSE FALSE FALSE #  5  TRUE  TRUE  TRUE  TRUE  TRUE 

Is this shorthand for "all columns"? Is this . specific dplyr syntax or is it general R syntax (as discussed here)?

Also, why does the following code result in an error?

dplyr::filter(df, . == 5) #  Error: object '.' not found 
like image 223
Megatron Avatar asked Feb 08 '16 14:02

Megatron


People also ask

What does a period mean in dplyr?

The dot is used within dplyr mainly (not exclusively) in mutate_each , summarise_each and do . In the first two (and their SE counterparts) it refers to all the columns to which the functions in funs are applied. In do it refers to the (potentially grouped) data.

What does %>% mean in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

Which are 5 of the most commonly used dplyr functions?

This article will cover the five verbs of dplyr: select, filter, arrange, mutate, and summarize.

What does the dplyr verb mutate do?

Overview. dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges: mutate() adds new variables that are functions of existing variables. select() picks variables based on their names.


1 Answers

The dot is used within dplyr mainly (not exclusively) in mutate_each, summarise_each and do. In the first two (and their SE counterparts) it refers to all the columns to which the functions in funs are applied. In do it refers to the (potentially grouped) data.frame so you can reference single columns by using .$xyz to reference a column named "xyz".

The reasons you cannot run

filter(df, . == 5) 

is because a) filter is not designed to work with multiple columns like mutate_each for example and b) you would need to use the pipe operator %>% (originally from magrittr).

However, you could use it with a function like rowSums inside filter when combined with the pipe operator %>%:

> filter(mtcars, rowSums(. > 5) > 4) Error: Objekt '.' not found  > mtcars %>% filter(rowSums(. > 5) > 4) %>% head()     lm cyl disp  hp drat    wt  qsec vs am gear carb 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 3 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 4 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 5 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 6 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4 

You should also take a look at the magrittr help files:

library(magrittr) help("%>%") 

From the help page:

Placing lhs elsewhere in rhs call Often you will want lhs to the rhs call at another position than the first. For this purpose you can use the dot (.) as placeholder. For example, y %>% f(x, .) is equivalent to f(x, y) and z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z).

Using the dot for secondary purposes Often, some attribute or property of lhs is desired in the rhs call in addition to the value of lhs itself, e.g. the number of rows or columns. It is perfectly valid to use the dot placeholder several times in the rhs call, but by design the behavior is slightly different when using it inside nested function calls. In particular, if the placeholder is only used in a nested function call, lhs will also be placed as the first argument! The reason for this is that in most use-cases this produces the most readable code. For example, iris %>% subset(1:nrow(.) %% 2 == 0) is equivalent to iris %>% subset(., 1:nrow(.) %% 2 == 0) but slightly more compact. It is possible to overrule this behavior by enclosing the rhs in braces. For example, 1:10 %>% {c(min(.), max(.))} is equivalent to c(min(1:10), max(1:10)).

like image 101
talat Avatar answered Oct 05 '22 09:10

talat