Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decode HTML encoded text in Dart

Tags:

dart

decoder

It seems that Dart does not provide a default mechanism (or at least I could not find it) to decode HTML escaped entities.

What I'd like to do is convert eg. Q&A to Q&A. (This is just an example)

As of version 1.11.1, Dart converts encodes these like so.

From there it is rather simple to create a custom converter implementation but that would not cover all the use-cases. Such as: what if < is expressed with the hex value of <?

Anyone got some pretty solution?

like image 896
Daniel V. Avatar asked Jul 13 '15 15:07

Daniel V.


People also ask

What is a dart library for unescaping HTML-encoded strings?

A Dart library for unescaping HTML-encoded strings. The idea is that while you seldom need encoding to such a level (most of the time, all you need to escape is <, >, /, & and " ), you do want to make sure that you cover the whole spectrum when decoding from HTML-escaped strings.

How to encode or decode Base64 in Dart?

To encode or decode Base64 in Dart, you can make use of the dart:convert library: If you want to base64-encode a string, you need to convert it to Uint8List by using utf8.encode (), like this: That’s it. Further reading: You can also check out our Flutter category page or Dart category page for the latest tutorials and examples.

How do you decode a string in HTML?

HTML Decode. HTML character decoding is the opposite process of encoding. The encoded characters are converted back to their original form in the decoding process. It decodes a string that contains HTML numeric character references and returns the decoded string. You can also choose to convert HTML code into JavaScript string.

How do I use Dart's convert package?

To use Dart's convert package, import the library first by adding the following: To perform encoding, use: You only need to pass the string to be encoded. To decode the bytes into a String, use: If allowMalformed is set to true, it will replace invalid or unterminated octet sequences with the Unicode Replacement character `U+FFFD` (�).


2 Answers

I just made a small but complete Dart library for that exact purpose: html_unescape.

It supports:

  • Named Character References (&nbsp;)
    • 2099 of them
  • Decimal Character References (&#225;)
  • Hexadecimal Character References (&#xE3;)

Sync use

import 'package:html_unescape/html_unescape.dart';

main() {
  var unescape = new HtmlUnescape();
  var text = unescape.convert("&lt;strong&#62;This &quot;escaped&quot; string");
  print(text);
}

Async use

You can also use the converter to transform a stream. For example, the code below will transform a POSIX stdin into an HTML-unencoded stdout.

await stdin
    .transform(new Utf8Decoder())
    .transform(new HtmlUnescape())
    .transform(new Utf8Encoder())
    .pipe(stdout);

More info + docs on pub.

like image 97
filiph Avatar answered Sep 28 '22 01:09

filiph


I think Dart/Flutter can do it for you by itself:

import 'dart:html' as html;

//  In production use library — universal_html: ^1.1.18
//  and — import 'package:universal_html/html.dart' as html;

void main() {
  String badString =
      'This &quot; string &quot; will be<strong> printed normally. &lt; &apos; &gt; </strong> &#62;';
      print(_parseHtmlString(badString));
}

String _parseHtmlString(String htmlString) {
  var text = html.Element.span()..appendHtml(htmlString);
  return text.innerText;
}

// It prints: This " string " will be printed normally. < ' > >

like image 20
Dmitry_Kovalov Avatar answered Sep 28 '22 02:09

Dmitry_Kovalov