Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download data from Statistics Sweden. Swedish characters in query

Tags:

r

httr

I have managed to retrieve some data from Statistics Sweden using the web site api. The answers to this question solved most of my problems.

How do I POST a JSON formatted request to GET JSON data from a URL in R into the data.frame in a less verbose manner?

But I still have two problems.

If I have characters with umlauts in my json-question (like “Å”, “Ä”, “Ö”) I get a “404” response from the server.

I’m trying to download data from this table:

Population 16+ years (RAMS) by region, employment, age and sex. Year 2004 - 2015

(You can get the query to the api on the web site if you click “Continue” and then “api for this table”, but you have to change the response format from "px" to "json".)

This code works:

library(jsonlite)
library(httr)

bodytxt <- '{
  "query": [
{
  "code": "Region",
  "selection": {
  "filter": "vs:RegionKommun07",
  "values": [
  "0114",
  "1280"
  ]
  }
},
  {
  "code": "Alder",
  "selection": {
  "filter": "item",
  "values": [
  "16-19"
  ]
  }
  },
  {
  "code": "Tid",
  "selection": {
  "filter": "item",
  "values": [
  "2015"
  ]
  }
  }
  ],
  "response": {
  "format": "json"
  }
  }'


req <- POST("http://api.scb.se/OV0104/v1/doris/en/ssd/START/AM/AM0207/AM0207H/BefSyssAldKonK",
            body = bodytxt, encode = "json")

stop_for_status(req)
json <- content(req, "text")


# JSON starts with an invalid character:
validate(json)
json <- substring(json, 2)
validate(json)

# Now we can parse
object <- fromJSON(json)
print(object)

But if I change the query so it includes a “Ö”, it doesn’t work. Example:

bodytxt <- '{
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "vs:RegionKommun07",
        "values": [
          "0114",
          "1280"
          ]
      }
    },
    {
      "code": "Sysselsattning",
      "selection": {
        "filter": "item",
        "values": [
          "FÖRV"
          ]
      }
    },
    {
      "code": "Alder",
      "selection": {
        "filter": "item",
        "values": [
          "16-19"
          ]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "item",
        "values": [
          "2015"
          ]
      }
    }
    ],
  "response": {
    "format": "json"
  }
}'

The other problem I have is that, as far as I understand, it should be possible to change the json query to a list and include the list in the call to the server, but I get a "404"-error. Example:

body_list <- fromJSON(bodytxt)
req <- POST("http://api.scb.se/OV0104/v1/doris/en/ssd/START/AM/AM0207/AM0207H/BefSyssAldKonK",
            body = body_list, encode = "json")

What am I doing wrong?

Ps! I know that it exists an excellent package on CRAN that is named pxweb that is very easy to use to download data from Statistics Sweden. But I want to learn the api and pxwed doesn’t let me skip dimensions in the query.

System: Windows 7, r script saved in utf-8 encoding.

like image 461
ChristianL Avatar asked Feb 20 '17 08:02

ChristianL


1 Answers

Try these parameters for fromJSON():

library(httr)
library(jsonlite)

your data:

bodytxt <- '{
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "vs:RegionKommun07",
        "values": [
          "0114",
          "1280"
          ]
      }
    },
    {
      "code": "Sysselsattning",
      "selection": {
        "filter": "item",
        "values": [
          "FÖRV"
          ]
      }
    },
    {
      "code": "Alder",
      "selection": {
        "filter": "item",
        "values": [
          "16-19"
          ]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "item",
        "values": [
          "2015"
          ]
      }
    }
    ],
  "response": {
    "format": "json"
  }
}'

Modified fromJSON() conversion:

query <- jsonlite::fromJSON(bodytxt,
                            simplifyVector=FALSE,
                            simplifyDataFrame=FALSE)

Same call you made (added verbose() for my benefit but you can remove that):

URL <- "http://api.scb.se/OV0104/v1/doris/en/ssd/START/AM/AM0207/AM0207H/BefSyssAldKonK"
req <- POST(URL, body=query, encode="json", verbose())

The result:

content(req, simplifyDataFrame=TRUE)

## Warning: JSON string contains (illegal) UTF8 byte-order-mark!

## $columns
##             code                        text type
## 1         Region                      region    d
## 2 Sysselsattning           employment status    d
## 3          Alder                         age    d
## 4            Tid                        year    t
## 5       AM0207F2 Population 16+ years (RAMS)    c
## 
## $comments
## list()
## 
## $data
##                       key values
## 1 0114, FÖRV, 16-19, 2015    379
## 2 1280, FÖRV, 16-19, 2015   1443

The warning is due to the fact that the API server is sending a byte order mark back with the result (it's a Microsoft server so they're kind of brain dead when it comes to this). jsonlite (the bit inside content() that does the conversion) can handle it.

like image 89
hrbrmstr Avatar answered Nov 05 '22 00:11

hrbrmstr