Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flutter http response.body bad utf8 encoding

I'm starting to learn Flutter and I'm doing it by making my own manga reading app, in which I scrape all the data from the website I use the most.

My problem is that only one of the mangas I read I can't scrape the data because of this error:

FormatException (FormatException: Bad UTF-8 encoding 0x22 (at offset 369))

My scraper code:

    Future<Manga> getMangaInfo(source) async{
    final response =  await _client.get(source);
    var manga;
    print(response.body);//error occurs here
    final document = parse(response.body);

    final mangaInfo = document.getElementsByClassName('tamanho-bloco-perfil');
    for(Element infos in mangaInfo){
      final infoCont = infos.getElementsByClassName('row');
      //get titulo
      Element tituloCont = infoCont[0];
      final tituloH = tituloCont.getElementsByTagName('h2');
      Element tituloCont2 = tituloH[0];
      String titulo = '['+tituloCont2.text+']';
      //print(titulo);

      //get capa

      Element capaCont = infoCont[2];
      final capaImg = capaCont.getElementsByTagName('img');
      Element capaCont2 = capaImg[0];
      final capaUrl = capaCont2.attributes['src'];

      //get caprecente
      final capsPorNumero = document.getElementsByClassName('row lancamento-linha');
      final caps = capsPorNumero[0].getElementsByTagName('a');
      Element info = caps[0];
      final numero = info.text.split(' ')[1];
      final capRecenteUrl = info.attributes['href'];

      manga = Manga(null,source,titulo,capaUrl,numero,capRecenteUrl);


    }
    return manga;

  }

The response.body that gives the error

I also tried using response.bodyBytes and decoding but still can't fix it

Here's the link to the page: https://unionleitor.top/perfil-manga/kimetsu-no-yaiba

What I guess is the problem is the � character on the following meta tag on the html head

<meta name="description" content="Kimetsu no Yaiba - Novo mangá sobrenatural da Shonen Jump. O mangá conta a história de Tanjiro, o filho mais velho de uma família que �">

I couldn't find the solution yet, maybe I just looked the wrong places. Can anyone help me to solve this issue ?
Thanks!

like image 669
Guilherme Salomao Avatar asked Dec 06 '25 06:12

Guilherme Salomao


2 Answers

I just do:

utf8.decode(response.bodyBytes);

even if you are geting a JSON

jsonDecode(utf8.decode(response.bodyBytes))
like image 127
Kevin Montalvo Avatar answered Dec 07 '25 20:12

Kevin Montalvo


Solution 1

HTTP in absence of a defined charset is assumed to be encoded in ISO-8859-1 (Latin-1). And body from its description is consistent with this behaviour. If the server response sets the Content-Type header to application/json; charset=utf-8 the body should work as expected.

The problem of course is that there are servers out there that do not set charset for JSON (which is valid), but which is also a bit of a grey area in between the two specs:

JSON is always supposed to be UTF-8, and for that reason says you don't need to set charset, but .. HTTP is always by default ISO-8859-1, unless the charset is explicitly set. A "smart" HTTP client could choose to follow the JSON definition closer than the HTTP definition and simply say any application/json is by default UTF-8 - technically violating the HTTP standard. However, the most robust solution is ultimately for the server to explicitly state the charset which is valid according to both standards.

  HttpClientRequest request = await HttpClient().post(_host, 4049, path) /*1*/
    ..headers.contentType = ContentType.json /*2*/
    ..write(jsonEncode(jsonData)); /*3*/
  HttpClientResponse response = await request.close(); /*4*/
  await response.transform(utf8.decoder /*5*/).forEach(print);

Solution 2 (flutter)

use replaceAll to replace response.body

newString.replaceAll('�', '');

Solution 3 (php)

use php file to get content first then use your url and use str_replace php

       $curlSession = curl_init();
        curl_setopt($curlSession, CURLOPT_URL, 'YOUR-URL');
        curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
        curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);

        $jsonData = curl_exec($curlSession);
echo $bodytag = str_replace("�", "", $jsonData);

        curl_close($curlSession);

Hope it helps.

like image 34
Mikel Tawfik Avatar answered Dec 07 '25 21:12

Mikel Tawfik