Separate a shopping list into multiple columns

Tags:

tidyr

I have a shopping list data like this:

df <- data.frame(id = 1:5, item = c("apple2milk5", "milk1", "juice3apple5", "egg10juice1", "egg8milk2"), stringsAsFactors = F)

#   id         item
# 1  1  apple2milk5
# 2  2        milk1
# 3  3 juice3apple5
# 4  4  egg10juice1
# 5  5    egg8milk2

I want to separate the variable item into multiple columns and record the number behind the goods. The problem I met is that the goods each person purchases are different so I cannot solve it using tidyr::separate() or other analogous functions. What I expect is:

#   id apple milk  juice egg  
# 1  1 2     5     NA    NA   
# 2  2 NA    1     NA    NA   
# 3  3 5     NA    3     NA   
# 4  4 NA    NA    1     10   
# 5  5 NA    2     NA    8

Note: The categories of goods in the market are unknown. So don't assume there are only 4 kinds of goods.

Thanks for any helps!

291

asked Nov 12 '19 15:11

Darren Tsai

Video Answer

3 Answers

I just came up with a tidyverse solution which uses stringr::str_extract_all() to extract the quantities, sets their names as product names, and expands them to wide using tidyr::unnest_wider().

library(tidyverse)

df %>%
  mutate(N = map2(str_extract_all(item, "\\d+"), str_extract_all(item, "\\D+"), set_names)) %>%
  unnest_wider(N, transform = as.numeric)

# # A tibble: 5 × 6
#      id item         apple  milk juice   egg
#   <int> <chr>        <dbl> <dbl> <dbl> <dbl>
# 1     1 apple2milk5      2     5    NA    NA
# 2     2 milk1           NA     1    NA    NA
# 3     3 juice3apple5     5    NA     3    NA
# 4     4 egg10juice1     NA    NA     1    10
# 5     5 egg8milk2       NA     2    NA     8

160

answered Oct 13 '22 07:10

Darren Tsai

I'll add yet another answer. It only slightly differs from @ASuliman's but uses a bit of the newer tidyr and some cute regex to become a bit more straightforward.

The regex trick is that the pattern "(?<=\\d)\\B(?=[a-z])" will match the non-boundary (i.e. an empty location) between numbers and letters, allowing you to create rows for every "apple5" type of entry. Extract the letters into an item column and numbers into a count column. Using the new pivot_wider which replaces spread, you can convert those counts to numeric values as you reshape.

library(dplyr)
library(tidyr)

df %>%
  separate_rows(item, sep = "(?<=\\d)\\B(?=[a-z])") %>%
  extract(item, into = c("item", "count"), regex = "^([a-z]+)(\\d+)$") %>%
  pivot_wider(names_from = item, values_from = count, values_fn = list(count = as.numeric))
#> # A tibble: 5 x 5
#>      id apple  milk juice   egg
#>   <int> <dbl> <dbl> <dbl> <dbl>
#> 1     1     2     5    NA    NA
#> 2     2    NA     1    NA    NA
#> 3     3     5    NA     3    NA
#> 4     4    NA    NA     1    10
#> 5     5    NA     2    NA     8

answered Oct 13 '22 08:10

camille

Possibily something like this, and should work with any item/quantity. It just assumes that the quantity follows the item.

Lets use a custom function which extracts item and quantity:

my_fun <- function(w) {
  items <- stringr::str_split(w, "\\d+", simplify = T)
  items <- items[items!=""] # dont now why but you get en empty spot each time
  quantities <- stringr::str_split(w, "\\D+", simplify = T)
  quantities <- quantities[quantities!=""]

  d <- data.frame(item = items, quantity=quantities, stringsAsFactors = F)


  return(d)

}

Example:

my_fun("apple2milk5")
# gives:
#    item quantity
# 1 apple        2
# 2  milk        5

Now we can apply the function to each id, using nest and map:

library(dplyr)
df_result <- df %>% 
  nest(item) %>% 
  mutate(res = purrr::map(data, ~my_fun(.x))) %>% 
  unnest(res)

df_results
# # A tibble: 9 x 3
# id item  quantity
# <int> <chr> <chr>   
# 1     1 apple 2       
# 2     1 milk  5       
# 3     2 milk  1       
# 4     3 juice 3       
# 5     3 apple 5       
# 6     4 egg   10      
# 7     4 juice 1       
# 8     5 egg   8       
# 9     5 milk  2

Now we can use dcast() (probabily spread would work too):

data.table::dcast(df_result, id~item, value.var="quantity")

#     id apple  egg juice milk
#   1  1     2 <NA>  <NA>    5
#   2  2  <NA> <NA>  <NA>    1
#   3  3     5 <NA>     3 <NA>
#   4  4  <NA>   10     1 <NA>
#   5  5  <NA>    8  <NA>    2

Data:

df <- data.frame(id = 1:5, item = c("apple2milk5", "milk1", "juice3apple5", "egg10juice1", "egg8milk2"), stringsAsFactors = F)

answered Oct 13 '22 08:10

RLave

Related questions
                            
                                How to interpret error "elements..... must be named" when sourcing an R6 class?
                            
                                image logo over TOC in Rmarkdown
                            
                                Split a vector into chunks such that sum of each chunk is approximately constant
                            
                                Indent without adding a bullet point or number in RMarkdown
                            
                                Convert Excel numeric to date
                            
                                wrapping long geom_text labels
                            
                                How to correctly output Plotly plots in shiny?
                            
                                Using dplyr summarize with different operations for multiple columns
                            
                                All combinations of letters/numbers under specific conditions
                            
                                r - Convert output from sf::st_within to vector
                            
                                R - ggplot2 time series x-axis to show last day of the month
                            
                                Image output in shiny app
                            
                                Convert an integer to a string in R
                            
                                R Caret Package Error - At least one of the class levels is not a valid R variable name
                            
                                Replacing zeroes with NA for values preceding non-zero
                            
                                R - Fitting a grid over a City Map and inputting data into grid squares
                            
                                ggplot scale_color_manual with breaks does not match expected order
                            
                                Generate all unique combinations from a vector with repeating elements
                            
                                R: What are dates in a dates vector: dates or numeric values? (difference between x[i] and i)
                            
                                Clear R environment of all objetcs & packages

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Separate a shopping list into multiple columns

Tags:

r

tidyr

Darren Tsai

People also ask

Video Answer

3 Answers

Darren Tsai

camille

RLave

Recent Activity

Donate For Us