Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I separate a string into multiple rows in the same database in R?

I have a problem. This is my initial example data base:

nam <- c("Marco", "Clara")
code <- c("The liquidations code for Marco are: 51-BMR05, 74-VAD08, 176-VNF09.",
          "The liquidations code for Clara are: 88-BMR05, 90-VAD08, 152-VNF09.")

df <- data.frame(name,code)

That look like this:

This is the example data base

So what I want is for the codes after the double dot to be separated and be a record with the same name. That is to say that the database is transformed and finished in this way:

Example of the required data base

I need to know if there is any way in R that can help me to facilitate and speed up this work. I did the examples in excel. In advance, I thank everyone for their help.

like image 985
Josué Aguilar Avatar asked Sep 16 '25 17:09

Josué Aguilar


2 Answers

Here is a tidy solution:

library(tidyverse)

df %>% 
  # Remove text and trailing dot
  mutate(
    code = stringr::str_remove(
      string = code, 
      pattern = "The liquidations code for .* are: "
    ),
    code = stringr::str_remove(
      string = code, 
      pattern = "\\.$"
    )
  ) %>%
  # Split the codes (results in list column)
  mutate(code = stringr::str_split(code, ", ")) %>% 
  # Turn list column into new rows
  unnest(code)


#> # A tibble: 6 × 2
#>   name  code     
#>   <chr> <chr>    
#> 1 Marco 51-BMR05 
#> 2 Marco 74-VAD08 
#> 3 Marco 176-VNF09
#> 4 Clara 88-BMR05 
#> 5 Clara 90-VAD08 
#> 6 Clara 152-VNF09

Created on 2022-03-28 by the reprex package (v2.0.1)


Data

Same code as posted by OP, but fixed nam to name:

name <- c("Marco", "Clara")
code <- c("The liquidations code for Marco are: 51-BMR05, 74-VAD08, 176-VNF09.",
          "The liquidations code for Clara are: 88-BMR05, 90-VAD08, 152-VNF09.")
df <- data.frame(name,code)
like image 190
jpiversen Avatar answered Sep 18 '25 10:09

jpiversen


We could use str_extract_all to extract all the codes in list and then unnest

library(dplyr)
library(stringr)
library(tidyr)
df %>% 
  mutate(code = str_extract_all(code, "\\d+-[A-Z0-9]+")) %>% 
  unnest(code)

-output

# A tibble: 6 × 2
  name  code     
  <chr> <chr>    
1 Marco 51-BMR05 
2 Marco 74-VAD08 
3 Marco 176-VNF09
4 Clara 88-BMR05 
5 Clara 90-VAD08 
6 Clara 152-VNF09
like image 36
akrun Avatar answered Sep 18 '25 10:09

akrun