Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a vector as new columns in existing data frame

Sample data

df <- data.frame(location = rep(1:1000, each = 36), 
                 year = rep(1980:2015,times = 1000),
                 mu = runif(min =  36.5, max = 43.2, 1000*36),
                 lambda = runif(min =  4.5, max = 4.8, 1000*36))

This data consits of 1000 locations and 36 years with two variables mu and lambda

For each location X year combination, I have a function which takes a value of lambda and mu and generates a vector of size 12. An example:

library(grofit)    
dat <- df[df$location == 1 & df$year == 1980,]  
y <- round(gompertz(1:12,100,dat$mu,dat$lambda), digits = 2)
y

  [1]  0.00  0.00  0.00  0.72 18.60 56.37 82.26 93.56 97.76 99.23
  [11] 99.74 99.91

If I want to add y as columns to dat

  new.col <- 5:16 
  dat[new.col] <- y
  dat

  location year       mu   lambda V5 V6 V7   V8   V9   V10   V11
     1     1980 39.60263 4.554095  0  0  0 0.72 18.6 56.37 82.26
  V12   V13   V14   V15   V16
  1 93.56 97.76 99.23 99.74 99.91

As you see, I have attached y as columns V5 till V16 in the dat. I want to repeat this for all location and year combination in df. I hope this is clear.

df %>% group_by(location year) %>% mutate(?? how to I add new columns for y??)
like image 515
89_Simple Avatar asked Jan 02 '23 15:01

89_Simple


2 Answers

You could use lapply():

library(grofit)    
df2 <- do.call(rbind, lapply(1:nrow(df), 
                             function(x) round(gompertz(1:12, 100, df[x, 3], df[x, 4]), 
                                               digits = 2)))
df3 <- cbind(df, df2)

result:

> head(df3)
  location year       mu   lambda 1 2 3    4     5     6     7     8     9    10    11    12
1        1 1980 43.04565 4.536717 0 0 0 0.61 20.58 61.23 85.88 95.39 98.54 99.55 99.86 99.96
2        1 1981 39.00524 4.505235 0 0 0 0.96 20.02 57.28 82.45 93.53 97.71 99.20 99.72 99.90
3        1 1982 41.60206 4.619627 0 0 0 0.42 17.07 56.52 83.18 94.23 98.10 99.38 99.80 99.94
4        1 1983 42.01069 4.689058 0 0 0 0.26 14.87 54.43 82.35 93.99 98.04 99.37 99.80 99.94
5        1 1984 40.34275 4.692595 0 0 0 0.30 14.36 52.30 80.54 93.03 97.61 99.20 99.73 99.91
6        1 1985 41.13246 4.641404 0 0 0 0.38 16.20 55.15 82.32 93.84 97.94 99.32 99.78 99.93

data:

set.seed(47)  # for sake of reproducibility
df <- data.frame(location = rep(1:1000, each = 36), 
                 year = rep(1980:2015, times = 1000),
                 mu = runif(min =  36.5, max = 43.2, 1000 * 36),
                 lambda = runif(min =  4.5, max = 4.8, 1000 * 36))
like image 146
jay.sf Avatar answered Jan 13 '23 11:01

jay.sf


Here is a tidyverse solution :

df <- head(df) # we'll work on a sample

library(tidyverse)
df %>%
  mutate(y = map2(mu,lambda,gompertz,time= 1:12,A = 100),
         y = map(y,. %>% round(2) %>% t %>% as_tibble)) %>% # we reformat the vectors as one line tibbles for smooth unnesting
  unnest %>%
  rename_at(5:16,~paste0("y",1:12))
#   location year       mu   lambda y1 y2 y3   y4    y5    y6    y7    y8    y9   y10   y11   y12
# 1        1 1980 38.52133 4.793232  0  0  0 0.20 11.20 46.38 76.37 90.97 96.73 98.84 99.59 99.86
# 2        1 1981 41.05032 4.668713  0  0  0 0.32 15.29 54.04 81.74 93.61 97.86 99.29 99.77 99.92
# 3        1 1982 36.76366 4.687794  0  0  0 0.45 13.67 48.07 76.37 90.55 96.41 98.66 99.51 99.82
# 4        1 1983 42.47994 4.766380  0  0  0 0.14 12.55 51.99 81.37 93.71 97.97 99.36 99.80 99.94
# 5        1 1984 36.58161 4.510503  0  0  0 1.09 18.81 53.90 79.56 91.89 96.92 98.85 99.57 99.84
# 6        1 1985 41.77695 4.705588  0  0  0 0.23 14.29 53.52 81.81 93.75 97.95 99.34 99.79 99.93

and a base version that should run faster:

new_df <- cbind(df,round(t(mapply(gompertz,  df$mu, df$lambda,MoreArgs = list(time= 1:12, A = 100))),2))
names(new_df)[5:16] <- paste0("y",1:12)
#   location year       mu   lambda y1 y2 y3   y4    y5    y6    y7    y8    y9   y10   y11   y12
# 1        1 1980 38.52133 4.793232  0  0  0 0.20 11.20 46.38 76.37 90.97 96.73 98.84 99.59 99.86
# 2        1 1981 41.05032 4.668713  0  0  0 0.32 15.29 54.04 81.74 93.61 97.86 99.29 99.77 99.92
# 3        1 1982 36.76366 4.687794  0  0  0 0.45 13.67 48.07 76.37 90.55 96.41 98.66 99.51 99.82
# 4        1 1983 42.47994 4.766380  0  0  0 0.14 12.55 51.99 81.37 93.71 97.97 99.36 99.80 99.94
# 5        1 1984 36.58161 4.510503  0  0  0 1.09 18.81 53.90 79.56 91.89 96.92 98.85 99.57 99.84
# 6        1 1985 41.77695 4.705588  0  0  0 0.23 14.29 53.52 81.81 93.75 97.95 99.34 99.79 99.93

An alternative to mapply that is not used so often is Vectorize, in this case I think its use is justified as this function seems like it really should be vectorized in the first place.

gompertz2 <- Vectorize(gompertz,c("mu","lambda"))
new_df <- cbind(df,round(t(gompertz2(1:12, 100, df$mu,df$lambda)),2))
names(new_df)[5:16] <- paste0("y",1:12)

# same output
like image 26
Moody_Mudskipper Avatar answered Jan 13 '23 09:01

Moody_Mudskipper