Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running aggregate function within dmapply (ddR package)

I would like to run the aggregate function within the dmapply function as offered through the ddR package.

Desired results

The desired results reflect a simple output generated via aggregate in base:

aggregate(
  x = mtcars$mpg,
  FUN = function(x) {
    mean(x, na.rm = TRUE)
  },
  by = list(trans = mtcars$am)
)

which produces:

  trans        x
1     0 17.14737
2     1 24.39231

Attempt - ddmapply

I would like to arrive at the same results while utilising ddmapply, as attempted below:

# ddR
require(ddR)

# ddR object creation
distMtcars <- as.dframe(mtcars)

# Aggregate / ddmapply
dmapply(
  FUN = function(x, y) {
    aggregate(FUN = mean(x, na.rm = TRUE),
              x = x,
              by = list(trans = y))
  },
  distMtcars$mpg,
  y = distMtcars$am,
  output.type = "dframe",
  combine = "rbind"
)

The code fails:

Error in match.fun(FUN) : 'mean(x, na.rm = TRUE)' is not a function, character or symbol Called from: match.fun(FUN)


Updates

Fixing error pointed out by @Mike removes the error, however, does not produce the desired result. The code:

# Avoid namespace conflict with other packages
ddR::collect(
  dmapply(
    FUN = function(x, y) {
      aggregate(
        FUN = function(x) {
          mean(x, na.rm = TRUE)
        },
        x = x,
        by = list(trans = y)
      )
    },
    distMtcars$mpg,
    y = distMtcars$am,
    output.type = "dframe",
    combine = "rbind"
  )
)

yields:

[1] trans x    
<0 rows> (or 0-length row.names)
like image 700
Konrad Avatar asked Oct 29 '22 08:10

Konrad


1 Answers

It works fine for me if you change your aggregate function to be consistent with the one you call earlier: FUN = function(x) mean(x, na.rm = T). The reason it can't find mean(x, na.rm = T) is because it isn't a function (it's a function call), rather mean is a function.

Also it will give you NA results unless you change your x = distMtcars$mpg to x = collect(distMtcars)$mpg. Same for y. With all that said, I think this should work for you:

res <-dmapply(
  FUN = function(x, y) {
    aggregate(FUN = function(x) mean(x, na.rm = TRUE),
              x = x,
              by = list(trans = y))
  },
  x = list(collect(distMtcars)$mpg),
  y = list(collect(distMtcars)$am),
  output.type = "dframe",
  combine = "rbind"
)

Then you can do collect(res) to see the result.

collect(res)
#  trans        x
#1     0 17.14737
#2     1 24.39231
like image 174
Mike H. Avatar answered Nov 15 '22 07:11

Mike H.