Context
As a followup to R: Pass data.frame by reference to a function and How to add a column in the data frame within a function
I am attempting the following, seemingly easy, function:
naToZero <- function(df) {
df$Vol[is.na(df$Vol)] <- 0
}
Data.frame
> str(WFM)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 990571 obs. of 14 variables:
$ Date : chr "04/12/2017" "04/12/2017" "04/12/2017" "04/12/2017" ...
$ Time :Classes 'hms', 'difftime' atomic [1:990571] 41970 41969 41968 41967 41966 ...
.. ..- attr(*, "units")= chr "secs"
$ Bar# : chr "197953/197953" NA "197952/197953" NA ...
$ Bar Index : int 0 NA -1 NA NA -2 NA NA -3 NA ...
$ Tick Range: int 0 NA 0 NA NA 0 NA NA 0 NA ...
$ Open : num 33.9 NA 33.9 NA NA ...
$ High : num 33.9 NA 33.9 NA NA ...
$ Low : num 33.9 NA 33.9 NA NA ...
$ Close : num 33.9 NA 33.9 NA NA ...
$ Vol : int 100 NA 200 NA NA 100 NA NA 400 NA ...
$ MACDHist : num -59 NA -87 NA NA ...
$ MACD : num -450 NA -445 NA NA ...
$ MACDSig : num -391 NA -358 NA NA ...
$ ZScore1 : num NA NA NA NA NA NA NA NA NA NA ...
Hoping to use this function to speed things up in data cleaning.
Problem
After I run the function in the script editor, and then pass a data.frame to run it. But the function does not do anything and when I View(WFM), it's still the same old data. However, when I manually run the command:
WFM$Vol[is.na(WFM$Vol)] <- 0
Then it works.
Things I tried
I tried experimenting based on the two links I saw, being seemingly relevant:
Using WFM <- naToZero(WFM), produces a vector with a single value, 0.
Tried using WFM <- data.table(WFM) and running the function... same thing.
I must be missing something basic.
Pandas dataframes allow you the flexibility of applying a function along a particular axis of a dataframe.
these arguments are of either the form value or tag = value . Component names are created based on the tag (if present) or the deparsed argument itself. NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
We cannot pass the function as an argument to another function. But we can pass the reference of a function as a parameter by using a function pointer.
Virtually all objects in R are immutable: operations do not modify the original, they create a copy. So you need to assign that copy back to the original.
<- does that, but it assigns to df inside your function, which is a copy of the argument (= WFM) you pass to your function.
So you need to modify your function:
naToZero <- function(df) {
df$Vol[is.na(df$Vol)] <- 0
df
}
… and how you call it:
WFM = naToZero(WFM)
We can make this more dynamic using the devel version of dplyr (soon to be released 0.6.0)
library(tidyverse)
naToZero <- function(df, Col) {
Col <- enquo(Col)
ColN <- quo_name(Col)
df %>%
mutate(!!ColN := replace(!!Col, is.na(!!Col), 0))
}
naToZero(WFM, Vol)
# A tibble: 3 × 2
# Date Vol
# <chr> <dbl>
#1 04/12/2017 0
#2 04/12/2017 23
#3 04/12/2017 40
Or for any other columns
naToZero(WFM, Open)
# A tibble: 3 × 3
# Date Vol Open
# <chr> <dbl> <dbl>
#1 04/12/2017 NA 33.9
#2 04/12/2017 23 0.0
#3 04/12/2017 40 32.0
The enquo does similar functionality as substitute from base R by taking input arguments and converting it to quosure. In the mutate, we can unquote (!! or UQ) to evaluate the columns as well as the strings on the lhs created with quo_name
WFM <- tibble(Date = rep("04/12/2017", 3), Vol = c(NA, 23, 40), Open = c(33.9, NA, 32))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With