How would I get the character count of the below in python?
s = 'הוא אוסף אתכם מחר בשלוש וחצי.'
Char count: 29
Char length: 52
len(s) = 52
? = 29
decode
your byte string (according to whatever encoding it's in, utf-8 maybe) -- the len
of the resulting Unicode string is what you're after.
If fact best practice is to decode inputs as soon as possible, deal only with actual text (i.e, unicode
, in Python 2; it's just the way ordinary strings are, in Python 3) in your code, and if need be encode
just as you're outputting again.
Byte strings should be handled in your program only if it's specifically about byte strings (e.g, controlling or monitoring some hardware device, &c) -- far more programs are about text, and thus, except where indispensable at some I/O boundaries, they should be exclusively dealing with text strings (spelled unicode
in Python 2:-).
But if you do want to keep s
as a bytestring nevertheless,
len(s.decode('utf-8'))
(or whatever other encoding you're using to represent text as byte strings) should still do what you request.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With