I want to apply a function to every row of a data frame. Using apply, the result is not itself a data frame again, it looks more like a list or matrix? (I don't know enough R to be able to tell from the output i get, just that it isn't a data frame)
Which is the right function to use to apply a function to every row of a data frame, returning a new data frame?
The function i want to apply to each row:
map_uri <- function(request){
ret <- request
uri_stem <- uri_map[uri_map[,1] == request["cs-uri-query"],2]
if(length(uri_stem) > 0){
ret <- request
ret["cs-uri-stem"] <- uri_stem
ret["cs-uri-query"] <- "-"
}
if(request["cs-uri-stem"] == "/index.html"){
ret["cs-uri-stem"] = "/"
}
return(ret)
}
what I am trying:
cleansed <- apply(requests, 1, map_uri)
cleansed[,c("cs-uri-query", "cs-uri-stem")]
which gives me the error
Fehler in cleansed[, c("cs-uri-stem", "cs-uri-query")] : Indizierung außerhalb der Grenzen
(Index out of bounds)
For some reason, the structure changes in a way that makes above indexing wrong.
[edit]
Data to make this a working example:
uri_map.tsv http://pastebin.com/XhUuTMqA
uri_map <- read.table("http://pastebin.com/raw/XhUuTMqA", sep="\t", header=FALSE)
And input data for the transformation function:
http://pastebin.com/b7ja4rKn
requests <- read.table("http://pastebin.com/raw/b7ja4rKn", sep=" ", header=TRUE)
You can use the apply family but, you're right, the result is either a matrix
or a list
. Not a big deal though to get back to a data.frame
.
Your function needs to return something consistent across columns (raw iris
instead of iris[, 1:4]
would not work below, because of iris$Species
which is a factor with 3 levels where summary
returns 6 numeric from a numeric column) and that's where a reproducible would help. Below, I used iris
and summary
:
as.data.frame(apply(iris[, 1:4], 2, summary))
as.data.frame(sapply(iris[, 1:4], summary))
do.call(cbind, lapply(iris[, 1:4], summary))
I have just implemented this function, which applies FUN
over rows as lists and concatenates the result to a tibble
:
library(magrittr)
lapply_rows <- function(df, return_tibble = TRUE, FUN, ...) {
df_rownames <- rownames(df)
res <- lapply(purrr::transpose(df), FUN = FUN, ...) %>%
purrr::map_depth(2, function(x) {
if (length(x) != 1) {
return(list(x))
} else {
return(x)
}
}) %>%
dplyr::bind_rows()
if (!return_tibble) {
res <- as.data.frame(res)
rownames(res) <- df_rownames
}
return(res)
}
df
is converted to a list of lists by purrr::transpose(df)
, where each sublist is one row of the original df
. FUN
must return a named list, which can also contain elements with a length other than one. These elements are then wrapped in list()
(type of a column of a data.frame
-like object could be also a list). If return_tibble
is FALSE
, result is coerced to data.frame
and original rownames are set.
Example:
df <- lapply_rows(mtcars, FUN = function(row_list) {
row_list$cyl_2 <- row_list$cyl ** 2
row_list$colors <- c("red", "green", "blue")
row_list$sublist <- mtcars[1:5, 1:5]
return(row_list)
})
head(df)
# A tibble: 6 x 14
mpg cyl disp hp drat wt qsec vs am gear carb cyl_2 colors sublist
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 36 <chr [3]> <df[,5] [5 × 5]>
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 36 <chr [3]> <df[,5] [5 × 5]>
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 16 <chr [3]> <df[,5] [5 × 5]>
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 36 <chr [3]> <df[,5] [5 × 5]>
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 64 <chr [3]> <df[,5] [5 × 5]>
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 36 <chr [3]> <df[,5] [5 × 5]>
Example returning a data.frame
:
df2 <- lapply_rows(mtcars, return_tibble = FALSE, FUN = function(row_list) {
row_list$cyl_2 <- row_list$cyl ** 2
row_list$colors <- c("red", "green", "blue")
row_list$sublist <- mtcars[1:5, 1:5]
return(row_list)
})
head(df2)
mpg cyl disp hp drat wt qsec vs am gear carb cyl_2 colors
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 36 red, green, blue
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 36 red, green, blue
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 16 red, green, blue
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 36 red, green, blue
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 64 red, green, blue
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 36 red, green, blue
sublist
Mazda RX4 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Mazda RX4 Wag 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Datsun 710 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Hornet 4 Drive 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Hornet Sportabout 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Valiant 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
(you can see that tibble
is handling the <list>
columns much better)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With