I'm using RandomAccessFile, and want to reserve a fixed length portion of the file for the user to enter a note as a String. My understanding of utf-8 is that different characters can take up different lengths, the max taking up to 3 bytes.
So I'm thinking my best user-friendly option is to tell the user they can enter up to 100 characters, then I reserve 100*3bytes worth of space in the file for the string. If they use characters that don't require as much space to be encoded, then there will just be some wasted space.
Is this the typical strategy for this scenario or is there a better way to do this?
Thanks
My understanding of utf-8 is that different characters can take up different lengths, the max taking up to 3 bytes.
Well, not quite. That's the case within the Basic Multilingual Plane (i.e. up to U+FFFF) but UTF-8 can take up to four bytes for characters up to U+1FFFFF. (I don't believe anything beyond that is currently used.) At that point your Java String
objects would be using more than one char
per character too though.
You can reasonably easily tell the length a particular string actually uses though - the simplest option is just to encode it and see how many bytes you get. I suspect that it's more user-friendly to allow more text in most cases, but not be "fair" about exactly how many characters can be used (i.e. with some characters taking more space than others). It really depends on whether your users will notice, and whether they want to use more than 100 characters...
UTF-8 can actually take up to 4 bytes. But yes, that approach is solid, if you really want to allow your user to enter any possible Unicode character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With