Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: apply-like function that returns a data frame?

Tags:

dataframe

r

I want to apply a function to every row of a data frame. Using apply, the result is not itself a data frame again, it looks more like a list or matrix? (I don't know enough R to be able to tell from the output i get, just that it isn't a data frame)

Which is the right function to use to apply a function to every row of a data frame, returning a new data frame?

The function i want to apply to each row:

map_uri <- function(request){
    ret <- request
    uri_stem <- uri_map[uri_map[,1] == request["cs-uri-query"],2]
    if(length(uri_stem) > 0){
        ret <- request
        ret["cs-uri-stem"] <- uri_stem
        ret["cs-uri-query"] <- "-"
    }
    if(request["cs-uri-stem"] == "/index.html"){
        ret["cs-uri-stem"] = "/"
    }

    return(ret)

}

what I am trying:

cleansed <- apply(requests, 1, map_uri)
cleansed[,c("cs-uri-query", "cs-uri-stem")]

which gives me the error

Fehler in cleansed[, c("cs-uri-stem", "cs-uri-query")] : Indizierung außerhalb der Grenzen

(Index out of bounds)

For some reason, the structure changes in a way that makes above indexing wrong.

[edit]

Data to make this a working example:

uri_map.tsv http://pastebin.com/XhUuTMqA

uri_map <- read.table("http://pastebin.com/raw/XhUuTMqA", sep="\t", header=FALSE)

And input data for the transformation function:

http://pastebin.com/b7ja4rKn

requests <- read.table("http://pastebin.com/raw/b7ja4rKn", sep=" ", header=TRUE)

like image 554
kutschkem Avatar asked May 02 '16 12:05

kutschkem


2 Answers

You can use the apply family but, you're right, the result is either a matrix or a list. Not a big deal though to get back to a data.frame.

Your function needs to return something consistent across columns (raw iris instead of iris[, 1:4] would not work below, because of iris$Species which is a factor with 3 levels where summary returns 6 numeric from a numeric column) and that's where a reproducible would help. Below, I used iris and summary:

  1. apply: as.data.frame(apply(iris[, 1:4], 2, summary))
  2. sapply: as.data.frame(sapply(iris[, 1:4], summary))
  3. lapply: do.call(cbind, lapply(iris[, 1:4], summary))
like image 60
Vincent Bonhomme Avatar answered Nov 08 '22 14:11

Vincent Bonhomme


I have just implemented this function, which applies FUN over rows as lists and concatenates the result to a tibble:

library(magrittr)

lapply_rows <- function(df, return_tibble = TRUE, FUN, ...) {
  df_rownames <- rownames(df)

  res <- lapply(purrr::transpose(df), FUN = FUN, ...) %>%
    purrr::map_depth(2, function(x) {
      if (length(x) != 1) {
        return(list(x))
      } else {
        return(x)
      }
    }) %>%
    dplyr::bind_rows()

  if (!return_tibble) {
    res <- as.data.frame(res)
    rownames(res) <- df_rownames
  }

  return(res)
}

df is converted to a list of lists by purrr::transpose(df), where each sublist is one row of the original df. FUN must return a named list, which can also contain elements with a length other than one. These elements are then wrapped in list() (type of a column of a data.frame-like object could be also a list). If return_tibble is FALSE, result is coerced to data.frame and original rownames are set.

Example:

df <- lapply_rows(mtcars, FUN = function(row_list) {
  row_list$cyl_2 <- row_list$cyl ** 2
  row_list$colors <- c("red", "green", "blue")
  row_list$sublist <- mtcars[1:5, 1:5]
  return(row_list)
})

head(df)
# A tibble: 6 x 14
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb cyl_2 colors    sublist         
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list>    <list>          
1  21       6   160   110  3.9   2.62  16.5     0     1     4     4    36 <chr [3]> <df[,5] [5 × 5]>
2  21       6   160   110  3.9   2.88  17.0     0     1     4     4    36 <chr [3]> <df[,5] [5 × 5]>
3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1    16 <chr [3]> <df[,5] [5 × 5]>
4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1    36 <chr [3]> <df[,5] [5 × 5]>
5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2    64 <chr [3]> <df[,5] [5 × 5]>
6  18.1     6   225   105  2.76  3.46  20.2     1     0     3     1    36 <chr [3]> <df[,5] [5 × 5]>

Example returning a data.frame:

df2 <- lapply_rows(mtcars, return_tibble = FALSE, FUN = function(row_list) {
  row_list$cyl_2 <- row_list$cyl ** 2
  row_list$colors <- c("red", "green", "blue")
  row_list$sublist <- mtcars[1:5, 1:5]
  return(row_list)
})

head(df2)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb cyl_2           colors
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    36 red, green, blue
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    36 red, green, blue
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1    16 red, green, blue
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    36 red, green, blue
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2    64 red, green, blue
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1    36 red, green, blue
                                                                                                                                                                                       sublist
Mazda RX4         21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Mazda RX4 Wag     21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Datsun 710        21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Hornet 4 Drive    21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Hornet Sportabout 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Valiant           21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15

(you can see that tibble is handling the <list> columns much better)

like image 45
jirinovo Avatar answered Nov 08 '22 14:11

jirinovo