Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting imported json data into a data frame

I have a file containing over 1500 json objects that I want to work with in R. I've been able to import the data as a list, but am having trouble coercing it into a useful structure. I want to create a data frame containing a row for each json object and a column for each key:value pair.

I've recreated my situation with this small, fake data set:

[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null}, {"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500}, {"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null}, {"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865}, {"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221}, {"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413}, {"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}] 

Some features of the data:

  • The objects all contain the same number of key:value pairs although some of the values are null
  • There are two non-numeric columns per object (name and group)
  • name is the unique identifier, there are 10 or so groups
  • many of the name and group entires contain spaces, commas and other punctuation.

Based on this question: R list(structure(list())) to data frame, I tried the following:

json_file <- "test.json" json_data <- fromJSON(json_file) asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame)) 

With both my real data and this fake data, the last line give me this error:

Error in data.frame(name = "Doe, John", group = "Red", `age (y)` = 24,  :    arguments imply differing number of rows: 1, 0 
like image 524
Andrew Staroscik Avatar asked Jun 05 '13 18:06

Andrew Staroscik


People also ask

How do I load a JSON file into a DataFrame?

To read the files, we use read_json() function and through it, we pass the path to the JSON file we want to read. Once we do that, it returns a “DataFrame”( A table of rows and columns) that stores data.

How do I convert a JSON to a DataFrame in R?

Convert JSON into a dataframe We simply use the fromJSON() function to read data from the data. json file and pass loaded data to the as. data. frame() method to convert into a data frame.

How do I unpack JSON data in Python?

So first thing you need to import the 'json' module into the file. Then create a simple json object string in python and assign it to a variable. Now we will use the loads() function from 'json' module to load the json data from the variable. We store the json data as a string in python with quotes notation.

What is the correct pandas function for loading JSON files into a DataFrame?

To read a JSON file via Pandas, we'll utilize the read_json() method and pass it the path to the file we'd like to read. The method returns a Pandas DataFrame that stores data in the form of columns and rows.


1 Answers

You just need to replace your NULLs with NAs:

require(RJSONIO)      json_file <-  '[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null},     {"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500},     {"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null},     {"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865},     {"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221},     {"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413},     {"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]'   json_file <- fromJSON(json_file)  json_file <- lapply(json_file, function(x) {   x[sapply(x, is.null)] <- NA   unlist(x) }) 

Once you have a non-null value for each element, you can call rbind without getting an error:

do.call("rbind", json_file)      name           group    age (y) height (cm) wieght (kg) score [1,] "Doe, John"    "Red"    "24"    "182"       "74.8"      NA    [2,] "Doe, Jane"    "Green"  "30"    "170"       "70.1"      "500" [3,] "Smith, Joan"  "Yellow" "41"    "169"       "60"        NA    [4,] "Brown, Sam"   "Green"  "22"    "183"       "75"        "865" [5,] "Jones, Larry" "Green"  "31"    "178"       "83.9"      "221" [6,] "Murray, Seth" "Red"    "35"    "172"       "76.2"      "413" [7,] "Doe, Jane"    "Yellow" "22"    "164"       "68"        "902" 
like image 104
SchaunW Avatar answered Oct 12 '22 23:10

SchaunW