Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decode a single character from octets in lisp

How can I decode a single character from a vector of octets in common lisp?

I want something like:

(decode-character vector :start i :encoding :utf-8)

or more specifically:

(decode-character #(195 164 195 173 99 195 176) :start 0)
=> #\LATIN_SMALL_LETTER_A_WITH_DIAERESIS

which would return the UTF-8 encoded character that starts at position i in vector.

I can't figure out how to do that using either babel or flexi-streams.

like image 481
Thayne Avatar asked Dec 08 '25 22:12

Thayne


2 Answers

(defun decode-character (vector &rest args)
  (char (apply #'babel:octets-to-string
               (coerce vector '(vector (unsigned-byte 8))) args)
        0))
like image 171
huaiyuan Avatar answered Dec 10 '25 23:12

huaiyuan


This is maybe not what you are looking for (I'd gladly update if I can). I did not look at Babel, but you could generalize the approach for other encodings I guess. I'll stick with trivial-utf-8 here. I would do this:

(defun decode-utf-8-char (octet-vector &key (start 0))
  (char (trivial-utf-8:utf-8-bytes-to-string 
          octet-vector
          :start start
          :end (+ start 4)) 0))

Gives the result you want with your example vector. The reason it works is because utf-8 characters are at most 4 bytes long. The call to char is here to grab the first character in case more than one were actually read.

like image 23
coredump Avatar answered Dec 10 '25 22:12

coredump