Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reserving a fixed length string space in a file when using utf-8?

Tags:

java

I'm using RandomAccessFile, and want to reserve a fixed length portion of the file for the user to enter a note as a String. My understanding of utf-8 is that different characters can take up different lengths, the max taking up to 3 bytes.

So I'm thinking my best user-friendly option is to tell the user they can enter up to 100 characters, then I reserve 100*3bytes worth of space in the file for the string. If they use characters that don't require as much space to be encoded, then there will just be some wasted space.

Is this the typical strategy for this scenario or is there a better way to do this?

Thanks

like image 969
user291701 Avatar asked Oct 03 '22 08:10

user291701


2 Answers

My understanding of utf-8 is that different characters can take up different lengths, the max taking up to 3 bytes.

Well, not quite. That's the case within the Basic Multilingual Plane (i.e. up to U+FFFF) but UTF-8 can take up to four bytes for characters up to U+1FFFFF. (I don't believe anything beyond that is currently used.) At that point your Java String objects would be using more than one char per character too though.

You can reasonably easily tell the length a particular string actually uses though - the simplest option is just to encode it and see how many bytes you get. I suspect that it's more user-friendly to allow more text in most cases, but not be "fair" about exactly how many characters can be used (i.e. with some characters taking more space than others). It really depends on whether your users will notice, and whether they want to use more than 100 characters...

like image 184
Jon Skeet Avatar answered Oct 11 '22 09:10

Jon Skeet


UTF-8 can actually take up to 4 bytes. But yes, that approach is solid, if you really want to allow your user to enter any possible Unicode character.

like image 23
Jan Dörrenhaus Avatar answered Oct 11 '22 08:10

Jan Dörrenhaus