Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse JSON in a DataFrame column using R

Tags:

json

r

How do I come from here ...

| ID | JSON Request                                                          |
==============================================================================
|  1 | {"user":"xyz1","weightmap": {"P1":0,"P2":100}, "domains":["a1","b1"]} |
------------------------------------------------------------------------------
|  2 | {"user":"xyz2","weightmap": {"P1":100,"P2":0}, "domains":["a2","b2"]} |
------------------------------------------------------------------------------

to here (The requirement is to make a table of JSON in column 2):

| User | P1 | P2 | domains | 
============================
| xyz1 |  0 |100 | a1, b1  |
----------------------------
| xyz2 |100 | 0  | a2, b2  |
----------------------------

Here is the code to generate the data.frame:

raw_df <- 
  data.frame(
    id   = 1:2,
    json = 
      c(
        '{"user": "xyz2", "weightmap": {"P1":100,"P2":0}, "domains": ["a2","b2"]}', 
        '{"user": "xyz1", "weightmap": {"P1":0,"P2":100}, "domains": ["a1","b1"]}'
      ), 
    stringsAsFactors = FALSE
  )
like image 259
parishodak Avatar asked Feb 01 '17 20:02

parishodak


People also ask

What does JSON parse () return?

The JSON. parse() method parses a string and returns a JavaScript object. The string has to be written in JSON format.


2 Answers

Here's a tidyverse solution (also using jsonlite) if you're happy to work in a long format (for domains in this case):

library(jsonlite)
library(dplyr)
library(purrr)
library(tidyr)

d <- data.frame(
  id = c(1, 2),
  json = c(
    '{"user":"xyz1","weightmap": {"P1":0,"P2":100}, "domains":["a1","b1"]}',
    '{"user":"xyz2","weightmap": {"P1":100,"P2":0}, "domains":["a2","b2"]}'
  ),
  stringsAsFactors = FALSE
)

d %>% 
  mutate(json = map(json, ~ fromJSON(.) %>% as.data.frame())) %>% 
  unnest(json)
#>   id user weightmap.P1 weightmap.P2 domains
#> 1  1 xyz1            0          100      a1
#> 2  1 xyz1            0          100      b1
#> 3  2 xyz2          100            0      a2
#> 4  2 xyz2          100            0      b2
  • mutate... is converting from a string to column of nested data frames.
  • unnest... is unnesting these data frames into multiple columns
like image 133
Simon Jackson Avatar answered Oct 05 '22 10:10

Simon Jackson


Could not get the flatten parameter to work as I expected so needed to unlist and then "re-list" before rbinding with do.call:

library(jsonlite)
 do.call( rbind, 
          lapply(raw_df$json, 
                  function(j) as.list(unlist(fromJSON(j, flatten=TRUE)))
        )       )
     user   weightmap.P1 weightmap.P2 domains1 domains2
[1,] "xyz2" "100"        "0"          "a2"     "b2"    
[2,] "xyz1" "0"          "100"        "a1"     "b1"    

Admittedly, this will require further processing since it coerces all the lines to character.

like image 20
IRTFM Avatar answered Oct 05 '22 12:10

IRTFM