Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse (in R) this API call into a .txt table format? (related to "open government" of Israel :) )

Tags:

parsing

r

Israel has released it's budget for all to see, and there is an API to extract the data. However, I don't know how to parse it into a txt/csv format.

Here is an example link to make a call for data.

Here is the output:

[
    {
        "parent": [
            {
                "budget_id": "00", 
                "title": "המדינה"
            }
        ], 
        "net_amount_revised": 6075053, 
        "year": 2003, 
        "title": "השכלה גבוהה", 
        "gross_amount_used": 5942975, 
        "gross_amount_revised": 5942975, 
        "budget_id": "0021", 
        "net_amount_used": 5936491, 
        "inflation_factor": 1.15866084989269, 
        "net_amount_allocated": 5861591, 
        "gross_amount_allocated": 5861591
    }, 
    {
        "parent": [
            {
                "budget_id": "0021", 
                "title": "השכלה גבוהה"
            }, 
            {
                "budget_id": "00", 
                "title": "המדינה"
            }
        ], 
        "net_amount_revised": 5364976, 
        "year": 2003, 
        "title": "השתתפות בתקציב המוסדות להשכלה גבוהה", 
        "gross_amount_used": 5337585, 
        "gross_amount_revised": 5337584, 
        "budget_id": "002102", 
        "net_amount_used": 5331101, 
        "inflation_factor": 1.15866084989269, 
        "net_amount_allocated": 4985915, 
        "gross_amount_allocated": 4985915
    }, 
    {
        "parent": [
            {
                "budget_id": "0021", 
                "title": "השכלה גבוהה"
            }, 
            {
                "budget_id": "00", 
                "title": "המדינה"
            }
        ], 
        "net_amount_revised": 565495, 
        "year": 2003, 
        "title": "השתתפות בפעולות", 
        "gross_amount_used": 462490, 
        "gross_amount_revised": 462490, 
        "budget_id": "002103", 
        "net_amount_used": 462490, 
        "inflation_factor": 1.15866084989269, 
        "net_amount_allocated": 559293, 
        "gross_amount_allocated": 559293
    }, 
    {
        "parent": [
            {
                "budget_id": "0021", 
                "title": "השכלה גבוהה"
            }, 
            {
                "budget_id": "00", 
                "title": "המדינה"
            }
        ], 
        "net_amount_revised": 0, 
        "year": 2003, 
        "title": "רזרבה להתייקרויות", 
        "gross_amount_used": 0, 
        "gross_amount_revised": null, 
        "budget_id": "002105", 
        "net_amount_used": null, 
        "inflation_factor": 1.15866084989269, 
        "net_amount_allocated": 171801, 
        "gross_amount_allocated": 171801
    }, 
    {
        "parent": [
            {
                "budget_id": "0021", 
                "title": "השכלה גבוהה"
            }, 
            {
                "budget_id": "00", 
                "title": "המדינה"
            }
        ], 
        "net_amount_revised": 108000, 
        "year": 2003, 
        "title": "פיתוח מוסדות להשכלה    גבוהה", 
        "gross_amount_used": 108000, 
        "gross_amount_revised": 108000, 
        "budget_id": "002106", 
        "net_amount_used": 108000, 
        "inflation_factor": 1.15866084989269, 
        "net_amount_allocated": 108000, 
        "gross_amount_allocated": 108000
    }, 
    {
        "parent": [
            {
                "budget_id": "0021", 
                "title": "השכלה גבוהה"
            }, 
            {
                "budget_id": "00", 
                "title": "המדינה"
            }
        ], 
        "net_amount_revised": 23634, 
        "year": 2003, 
        "title": "תחום פעולה כללי", 
        "gross_amount_used": 23634, 
        "gross_amount_revised": 23634, 
        "budget_id": "002101", 
        "net_amount_used": 23634, 
        "inflation_factor": 1.15866084989269, 
        "net_amount_allocated": 23634, 
        "gross_amount_allocated": 23634
    }, 
    {
        "parent": [
            {
                "budget_id": "0021", 
                "title": "השכלה גבוהה"
            }, 
            {
                "budget_id": "00", 
                "title": "המדינה"
            }
        ], 
        "net_amount_revised": 12948, 
        "year": 2003, 
        "title": "פעולות עם משרדים       ומוסדות אחרים", 
        "gross_amount_used": 11266, 
        "gross_amount_revised": 11266, 
        "budget_id": "002104", 
        "net_amount_used": 11266, 
        "inflation_factor": 1.15866084989269, 
        "net_amount_allocated": 12948, 
        "gross_amount_allocated": 12948
    }
]

What would be the way to parse this into a table format?

Thanks!

Tal

like image 681
Tal Galili Avatar asked Mar 18 '11 12:03

Tal Galili


4 Answers

If you install the rjson package You should be able to do:

do.call( 'rbind', fromJSON( file="http://budget.yeda.us/0021?year=2003&depth=1" ) )

[edit]

Actually.. that has problems with the variable length internal parent list, but should get you half way there

like image 125
tim_yates Avatar answered Nov 03 '22 21:11

tim_yates


Yep, it's JSON. fromJSON will turn it into a list for you

resp <- getURL("http://budget.yeda.us/0021?year=2003&depth=1")
library(rjson)
resp <- fromJSON(resp)

That gets you to list form. For a data frame, try:

library(plyr)
resp <- llply(resp, function(x) llply(x, function(y) ifelse(is.null(y), "NULL", y)))
budget <- data.frame()
for(i in 1:length(resp)) {
  budget <- rbind.fill(budget, data.frame(resp[[i]]))
}

The nested llply take care of some unpleasantness when creating data frames that include null values.

like image 36
Noah Avatar answered Nov 03 '22 21:11

Noah


Looks like JSON. Try the rjson package, but some looping or tricky listy fiddling may be required.

Lunchtime now, otherwise I'd have a solution pasted in. Give the non-lunching portion of the hive mind a few minutes....

like image 43
Spacedman Avatar answered Nov 03 '22 20:11

Spacedman


This is similar to a previous answer but also creates columns for the second budget_id and title fields in parent, not just the first and is structured slightly differently operating on the parent and the rest separately and then putting them back together.

library(rjson)
library(plyr)
js <- fromJSON(file = "http://budget.yeda.us/0021?year=2003&depth=1")

toDF <- function(x) do.call("rbind.fill", lapply(x, as.data.frame))
Null2NA <- function(x) if (is.null(x)) NA else x

parent1 <- lapply(js, "[[", "parent")
rest <- lapply(js, function(x) lapply(x[-1], Null2NA))
cbind(toDF(parent1), toDF(rest))
like image 1
G. Grothendieck Avatar answered Nov 03 '22 20:11

G. Grothendieck