Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge these two dataframes?

Tags:

r

dplyr

With R and dplyr :

library(dplyr)

tab1 <- tribble(
  ~code, ~periode, ~prix,
  "a", "2023-01-01", 3,
  "b", "2023-01-01", 2,
  "c", "2023-02-01", 2,
  "d", "2023-02-01", 4
)

tab2 <- tribble(
  ~code, ~`2018-01-01`,  ~`2018-02-01`,  ~`2019-01-01`, ~`2019-02-01`, ~`2020-01-01`,  
  ~`2020-02-01`,~`2021-01-01`,  ~`2021-02-01`, ~`2022-01-01`, ~`2022-02-01`,
  "a", 2, 1, 2, 3, 3, 4, 1, 2, 2, 1,
  "b", 1, 2, 4, 2, 1, 2, 1, 5, 1, 2,
  "c", 2, 1, 5, 6, 1, 3, 4, 3, 1, 3,
  "d", 3, 2, 7, 8, 6, 4, 5, 9, 7, 8
)

I'd like to add three columns N_1, N_2 and N_3 to tab1 so that, for example, when periode is 2023-01-01 in tab1 :

  • N_1 contains the value of 2022-01-01 in tab2
  • N_2 contains the value of 2021-01-01 in tab2
  • N_3 contains the value of 2020-01-01 in tab2

Here's a minimal reproducible example with the expected result :

# A tibble: 4 × 6
  code  periode     prix   N_1   N_2   N_3
  <chr> <chr>      <dbl> <dbl> <dbl> <dbl>
1 a     2023-01-01     3     2     1     3
2 b     2023-01-01     2     1     1     1
3 c     2023-02-01     2     3     3     3
4 d     2023-02-01     4     8     9     4

In the real table, there are of course all the months of each year, which is why it's important to find the right columns to add.

I tried something like that :

result <- tab1 |>
 group_by(periode) |> 
 mutate(N_1 = tab2[[""]]... 

Many thanks in advance !

like image 277
Damien Dotta Avatar asked Jun 16 '26 12:06

Damien Dotta


1 Answers

It seems what you're trying to do is to pivot tab2 into longer form, filter for years 2020 to 2022, rename them to N_1 to N_3, then pivot them back to wide form grouping by month.

library(tidyverse)
library(lubridate)
tab2 %>% 
  pivot_longer(-code,names_to = "date") %>%
  mutate(year = year(date),
         month = month(date)) %>%
  filter(year %in% (2020:2022)) %>% 
  select(-date) %>%
  mutate(year = recode(year, `2022` = "N_1", `2021` = "N_2", `2020` = "N_3")) %>%
  pivot_wider(id_cols = c("month","code"),names_from = "year") -> tab2_revised

tab2_revised
## A tibble: 8 × 5
#  month code    N_3   N_2   N_1
#  <dbl> <chr> <dbl> <dbl> <dbl>
#1     1 a         3     1     2
#2     2 a         4     2     1
#3     1 b         1     1     1
#4     2 b         2     5     2
#5     1 c         1     4     1
#6     2 c         3     3     3
#7     1 d         6     5     7
#8     2 d         4     9     8

Once we have that, we can join to tab1 by month.

tab1 %>%
  mutate(month = month(ymd(periode))) %>%
  left_join(tab2_revised, by = c("code","month")) %>%
  select(code,periode,prix,N_1,N_2,N_3)
## A tibble: 4 × 6
#  code  periode     prix   N_1   N_2   N_3
#  <chr> <chr>      <dbl> <dbl> <dbl> <dbl>
#1 a     2023-01-01     3     2     1     3
#2 b     2023-01-01     2     1     1     1
#3 c     2023-02-01     2     3     3     3
#4 d     2023-02-01     4     8     9     4
like image 106
Ian Campbell Avatar answered Jun 18 '26 02:06

Ian Campbell