Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding JSON in r

Tags:

json

r

encoding

How to change the encoding when using the JSON package in R?

for (pageNum in 0:20) {
  data <- fromJSON(paste0("https://api.hh.ru/vacancies?text=\"бухгалтер\"&page=", pageNum))
  vacanciesdf <- rbind(vacanciesdf, data.frame(
    data$items$area$name, 
    data$items$salary$currency, 
    data$items$salary$from, 
    data$items$employer$name,
    data$items$name,
    data$items$snippet$requirement))
  print(paste0("Upload pages:", pageNum + 1))
  Sys.sleep(3)
}

In English, downloading from the API at the introduction of the keyword works, but nothing is loaded in Russian. I assume that the problem is in the encoding. But how to install UTF-8?

like image 482
Alexander Botvin Avatar asked Dec 28 '17 05:12

Alexander Botvin


People also ask

What is JSON encoding?

JSON (JavaScript Object Notation, pronounced /ˈdʒeɪsən/; also /ˈdʒeɪˌsɒn/) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values).

Does JSON need to be encoded?

If the request/response payload is valid JSON, then that implies that all "escape characters" are encoded as text that should not interfere with HTTP in any way. E.g., newlines encoded as "\n", nulls as "\u0000", etc. So my short answer is that you don't need any further encoding for the sake of HTTP.

What is JSON encoding and decoding?

The decoder functionality lets you read JSON data, navigate the objects and arrays within the data, and read and get metadata about individual values. This saves you from writing a parser to read and store complex, multi-layered data. The encoder functionality lets you write JSON data in a field-by-field manner.


1 Answers

These kinds of problems are hard to reproduce, but using 'content GET' to apply UTF-8 encoding, before the 'fromJSON' often resolves the problem.

The URL provided in your question returns an error, so this solution demonstrates the principal, getting the argument list from the api you are using.

library(httr)
library(jsonlite)

URL <- "https://api.hh.ru/vacancies?describe_arguments=true"
text <- content(GET(URL), as = "text", encoding = "UTF-8")
data <- fromJSON(text)

This returns UTF-8 encoded JSON.

like image 187
GGAnderson Avatar answered Sep 22 '22 06:09

GGAnderson