Context
As a followup to R: Pass data.frame by reference to a function and How to add a column in the data frame within a function
I am attempting the following, seemingly easy, function:
naToZero <- function(df) {
df$Vol[is.na(df$Vol)] <- 0
}
Data.frame
> str(WFM)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 990571 obs. of 14 variables:
$ Date : chr "04/12/2017" "04/12/2017" "04/12/2017" "04/12/2017" ...
$ Time :Classes 'hms', 'difftime' atomic [1:990571] 41970 41969 41968 41967 41966 ...
.. ..- attr(*, "units")= chr "secs"
$ Bar# : chr "197953/197953" NA "197952/197953" NA ...
$ Bar Index : int 0 NA -1 NA NA -2 NA NA -3 NA ...
$ Tick Range: int 0 NA 0 NA NA 0 NA NA 0 NA ...
$ Open : num 33.9 NA 33.9 NA NA ...
$ High : num 33.9 NA 33.9 NA NA ...
$ Low : num 33.9 NA 33.9 NA NA ...
$ Close : num 33.9 NA 33.9 NA NA ...
$ Vol : int 100 NA 200 NA NA 100 NA NA 400 NA ...
$ MACDHist : num -59 NA -87 NA NA ...
$ MACD : num -450 NA -445 NA NA ...
$ MACDSig : num -391 NA -358 NA NA ...
$ ZScore1 : num NA NA NA NA NA NA NA NA NA NA ...
Hoping to use this function to speed things up in data cleaning.
Problem
After I run the function in the script editor, and then pass a data.frame to run it. But the function does not do anything and when I View(WFM), it's still the same old data. However, when I manually run the command:
WFM$Vol[is.na(WFM$Vol)] <- 0
Then it works.
Things I tried
I tried experimenting based on the two links I saw, being seemingly relevant:
Using WFM <- naToZero(WFM)
, produces a vector with a single value, 0.
Tried using WFM <- data.table(WFM)
and running the function... same thing.
I must be missing something basic.
Pandas dataframes allow you the flexibility of applying a function along a particular axis of a dataframe.
these arguments are of either the form value or tag = value . Component names are created based on the tag (if present) or the deparsed argument itself. NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
We cannot pass the function as an argument to another function. But we can pass the reference of a function as a parameter by using a function pointer.
Virtually all objects in R are immutable: operations do not modify the original, they create a copy. So you need to assign that copy back to the original.
<-
does that, but it assigns to df
inside your function, which is a copy of the argument (= WFM
) you pass to your function.
So you need to modify your function:
naToZero <- function(df) {
df$Vol[is.na(df$Vol)] <- 0
df
}
… and how you call it:
WFM = naToZero(WFM)
We can make this more dynamic using the devel version of dplyr
(soon to be released 0.6.0
)
library(tidyverse)
naToZero <- function(df, Col) {
Col <- enquo(Col)
ColN <- quo_name(Col)
df %>%
mutate(!!ColN := replace(!!Col, is.na(!!Col), 0))
}
naToZero(WFM, Vol)
# A tibble: 3 × 2
# Date Vol
# <chr> <dbl>
#1 04/12/2017 0
#2 04/12/2017 23
#3 04/12/2017 40
Or for any other columns
naToZero(WFM, Open)
# A tibble: 3 × 3
# Date Vol Open
# <chr> <dbl> <dbl>
#1 04/12/2017 NA 33.9
#2 04/12/2017 23 0.0
#3 04/12/2017 40 32.0
The enquo
does similar functionality as substitute
from base R
by taking input arguments and converting it to quosure
. In the mutate
, we can unquote (!!
or UQ
) to evaluate the columns as well as the strings on the lhs created with quo_name
WFM <- tibble(Date = rep("04/12/2017", 3), Vol = c(NA, 23, 40), Open = c(33.9, NA, 32))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With