How can I decode a single character from a vector of octets in common lisp?
I want something like:
(decode-character vector :start i :encoding :utf-8)
or more specifically:
(decode-character #(195 164 195 173 99 195 176) :start 0)
=> #\LATIN_SMALL_LETTER_A_WITH_DIAERESIS
which would return the UTF-8 encoded character that starts at position i in vector.
I can't figure out how to do that using either babel or flexi-streams.
(defun decode-character (vector &rest args)
(char (apply #'babel:octets-to-string
(coerce vector '(vector (unsigned-byte 8))) args)
0))
This is maybe not what you are looking for (I'd gladly update if I can).
I did not look at Babel, but you could generalize the approach for other encodings I guess. I'll stick with trivial-utf-8 here. I would do this:
(defun decode-utf-8-char (octet-vector &key (start 0))
(char (trivial-utf-8:utf-8-bytes-to-string
octet-vector
:start start
:end (+ start 4)) 0))
Gives the result you want with your example vector.
The reason it works is because utf-8 characters are at most 4 bytes long. The call to char is here to grab the first character in case more than one were actually read.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With