I have a file containing over 1500 json objects that I want to work with in R. I've been able to import the data as a list, but am having trouble coercing it into a useful structure. I want to create a data frame containing a row for each json object and a column for each key:value pair.
I've recreated my situation with this small, fake data set:
[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null}, {"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500}, {"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null}, {"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865}, {"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221}, {"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413}, {"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]
Some features of the data:
Based on this question: R list(structure(list())) to data frame, I tried the following:
json_file <- "test.json" json_data <- fromJSON(json_file) asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame))
With both my real data and this fake data, the last line give me this error:
Error in data.frame(name = "Doe, John", group = "Red", `age (y)` = 24, : arguments imply differing number of rows: 1, 0
To read the files, we use read_json() function and through it, we pass the path to the JSON file we want to read. Once we do that, it returns a “DataFrame”( A table of rows and columns) that stores data.
Convert JSON into a dataframe We simply use the fromJSON() function to read data from the data. json file and pass loaded data to the as. data. frame() method to convert into a data frame.
So first thing you need to import the 'json' module into the file. Then create a simple json object string in python and assign it to a variable. Now we will use the loads() function from 'json' module to load the json data from the variable. We store the json data as a string in python with quotes notation.
To read a JSON file via Pandas, we'll utilize the read_json() method and pass it the path to the file we'd like to read. The method returns a Pandas DataFrame that stores data in the form of columns and rows.
You just need to replace your NULLs with NAs:
require(RJSONIO) json_file <- '[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null}, {"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500}, {"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null}, {"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865}, {"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221}, {"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413}, {"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]' json_file <- fromJSON(json_file) json_file <- lapply(json_file, function(x) { x[sapply(x, is.null)] <- NA unlist(x) })
Once you have a non-null value for each element, you can call rbind
without getting an error:
do.call("rbind", json_file) name group age (y) height (cm) wieght (kg) score [1,] "Doe, John" "Red" "24" "182" "74.8" NA [2,] "Doe, Jane" "Green" "30" "170" "70.1" "500" [3,] "Smith, Joan" "Yellow" "41" "169" "60" NA [4,] "Brown, Sam" "Green" "22" "183" "75" "865" [5,] "Jones, Larry" "Green" "31" "178" "83.9" "221" [6,] "Murray, Seth" "Red" "35" "172" "76.2" "413" [7,] "Doe, Jane" "Yellow" "22" "164" "68" "902"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With