I want to use Rebol 3 to read a file in Latin1 and convert it to UTF-8. Is there a built-in function I can use, or some external library? Where I can find it?
Rebol has an invalid-utf? function that scours a binary value for a byte that is not part of a valid UTF-8 sequence. We can just loop until we've found and replaced all of them, then convert our binary value to a string:
latin1-to-utf8: function [binary [binary!]][
mark: :binary
while [mark: invalid-utf? mark][
change/part mark to char! mark/1 1
]
to string! binary
]
This function modifies the original binary. We can create a new string instead that leaves the binary value intact:
latin1-to-utf8: function [binary [binary!]][
mark: :binary
to string! rejoin collect [
while [mark: invalid-utf? binary][
keep copy/part binary mark ; keeps the portion up to the bad byte
keep to char! mark/1 ; converts the bad byte to good bytes
binary: next mark ; set the series beyond the bad byte
]
keep binary ; keep whatever is remaining
]
]
Bonus: here's a wee Rebmu version of the above—rebmu/args snippet #{DECAFBAD}
where snippet
is:
; modifying
IUgetLOAD"invalid-utf?"MaWT[MiuM][MisMtcTKm]tsA
; copying
IUgetLOAD"invalid-utf?"MaTSrjCT[wt[MiuA][kp copy/partAmKPtcFm AnxM]kpA]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With