Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform file encoding conversion with Rebol 3

I want to use Rebol 3 to read a file in Latin1 and convert it to UTF-8. Is there a built-in function I can use, or some external library? Where I can find it?

like image 763
giuliolunati Avatar asked Dec 15 '22 01:12

giuliolunati


1 Answers

Rebol has an invalid-utf? function that scours a binary value for a byte that is not part of a valid UTF-8 sequence. We can just loop until we've found and replaced all of them, then convert our binary value to a string:

latin1-to-utf8: function [binary [binary!]][
    mark: :binary
    while [mark: invalid-utf? mark][
        change/part mark to char! mark/1 1
    ]
    to string! binary
]

This function modifies the original binary. We can create a new string instead that leaves the binary value intact:

latin1-to-utf8: function [binary [binary!]][
    mark: :binary
    to string! rejoin collect [
        while [mark: invalid-utf? binary][
            keep copy/part binary mark  ; keeps the portion up to the bad byte
            keep to char! mark/1        ; converts the bad byte to good bytes
            binary: next mark           ; set the series beyond the bad byte
        ]
        keep binary                     ; keep whatever is remaining
    ]
]

Bonus: here's a wee Rebmu version of the above—rebmu/args snippet #{DECAFBAD} where snippet is:

; modifying
IUgetLOAD"invalid-utf?"MaWT[MiuM][MisMtcTKm]tsA

; copying
IUgetLOAD"invalid-utf?"MaTSrjCT[wt[MiuA][kp copy/partAmKPtcFm AnxM]kpA]
like image 127
rgchris Avatar answered Dec 19 '22 11:12

rgchris