Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert lists of values in a string, separated by two different delimiters, into separate rows

I have a data.table (dt) which looks like this

#    id               value
# 1:  1      {1:3.2,2:14.2}
# 2:  2      {1:14.0,2:3.0}
# 3:  3 {1:3.4,2:3.9,3:0.1}
# 4:  4             {1:2.1}

The value column comprises lists of values, where each one is introduced by its position in the list (1,2,3...) followed by a :, then the value itself. list values are separated by commas, and the whole list is enclosed in braces {...}.

I want to convert this to a long format (dt.all) like

#    id N value
# 1:  1 1   3.2
# 2:  1 2  14.2
# 3:  2 1  14.0
# 4:  2 2   3.0
# 5:  3 1   3.4
# 6:  3 2   3.9
# 7:  3 3   0.1
# 8:  4 1   2.1

Various failed attempts include:

stingr::str_extract(string = dt$value, pattern = "(?<=:).*(?=,)")
gsub(".*: *(.*?) *,.*", "\\1", dt$value)
gsubfn:strapplyc(dt$value, ":(.*?),", simplify = c)

Any suggestions on a neat way to do this? data.table solutions preferred, but failing that will accept tidyverse or other methods.

The data:

dt = structure(list(id = 1:4, value = c("{1:3.2,2:14.2}", "{1:14.0,2:3.0}", 
  "{1:3.4,2:3.9,3:0.1}", "{1:2.1}")), row.names = c(NA, -4L), class = c("data.table", 
    "data.frame"))

dt.all = structure(list(id = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L), N = c(1L, 
  2L, 1L, 2L, 1L, 2L, 3L, 1L), value = c(3.2, 14.2, 14, 3, 3.4, 
    3.9, 0.1, 2.1)), row.names = c(NA, -8L), class = c("data.table", 
      "data.frame"))
like image 594
dww Avatar asked Dec 08 '22 11:12

dww


1 Answers

I found that if I use gsub to wrap the item numbers in quotes, then I can parse these as json expressions using library(jsonlite).

dt[, .(value = {v1 <- unlist(fromJSON(gsub("([0-9]+):", '"\\1":', value)))}, 
       N = names(v1)),by=id]

#    id value N
# 1:  1   3.2 1
# 2:  1  14.2 2
# 3:  2  14.0 1
# 4:  2   3.0 2
# 5:  3   3.4 1
# 6:  3   3.9 2
# 7:  3   0.1 3
# 8:  4   2.1 1
like image 143
dww Avatar answered May 16 '23 03:05

dww