Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assign categories from JSON column in data table

I have a data.table that looks like dt (provided at the end), which includes columns EI and CutsLabsCV.

CutsLabsCV is a JSON of two values indicating cuts and labels that should be used for classifying my data set.

I need to create a new column EIRange, that classifies the EI column based on cuts and labels provided on CutsLabsCV.

The expected result should be:

          EI                                    CutsLabsCV EIRange
       <num>                                        <list>  <fctr>
 1: 101.9163 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 2: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 3: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 4: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 5: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 6: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 7: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 8: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 9: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120
10: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120

How can I achieve this in data.table? I need the most efficient solution since my real dataset exceeds 2M rows.

Sample dataset:

dt <- data.table(EI = c(101.91625, 122.03178865, 122.03178865,
                        122.03178865, 122.03178865, 122.03178865, 122.03178865, 122.03178865,
                        109.521980125, 109.521980125),
                 CutsLabsCV = list(c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",90,130,\"Inf\"]", "[\"<90\",\"90-130\",\">130\"]"),
                                   c("[\"-Inf\",85,120,\"Inf\"]", "[\"<85\",\"85-120\",\">120\"]"),
                                   c("[\"-Inf\",85,120,\"Inf\"]", "[\"<85\",\"85-120\",\">120\"]")))
like image 462
thiagoveloso Avatar asked Nov 03 '25 08:11

thiagoveloso


2 Answers

One possible solution :

dt[,EIRange:=sapply(CutsLabsCV,\(x) jsonlite::fromJSON(x[2])[2])]
dt

#           EI                                    CutsLabsCV EIRange
#        <num>                                        <list>  <char>
#  1: 101.9163 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  2: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  3: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  4: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  5: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  6: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  7: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  8: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
#  9: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120
# 10: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120
like image 190
Waldi Avatar answered Nov 04 '25 21:11

Waldi


Avoiding sapply(), unlist() + matrix() + sub() might be an alternative

dt[, EIRange := sub('.*"(\\d+-\\d+)".*', '\\1', matrix(unlist(CutsLabsCV, FALSE, FALSE), ncol = 2L, byrow = TRUE)[, 2L])]
> dt 
      EI                                        CutsLabsCV  EIRange
       <num>                                        <list>  <char>
 1: 101.9163 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 2: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 3: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 4: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 5: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 6: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 7: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 8: 122.0318 ["-Inf",90,130,"Inf"],["<90","90-130",">130"]  90-130
 9: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120
10: 109.5220 ["-Inf",85,120,"Inf"],["<85","85-120",">120"]  85-120
like image 42
Friede Avatar answered Nov 04 '25 21:11

Friede



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!