Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to understand the Ruby .chr and .ord methods

Tags:

ruby

encoding

I've been working with the Ruby chr and ord methods recently and there are a few things I don't understand.

My current project involves converting individual characters to and from ordinal values. As I understand it, if I have a string with an individual character like "A" and I call ord on it I get its position on the ASCII table which is 65. Calling the inverse, 65.chr gives me the character value "A", so this tells me that Ruby has a collection somewhere of ordered character values, and it can use this collection to give me the position of a specific character, or the character at a specific position. I may be wrong on this, please correct me if I am.

Now I also understand that Ruby's default character encoding uses UTF-8 so it can work with thousands of possible characters. Thus if I ask it for something like this:

'好'.ord

I get the position of that character which is 22909. However, if I call chr on that value:

22909.chr

I get "RangeError: 22909 out of char range." I'm only able to get char to work on values up to 255 which is extended ASCII. So my questions are:

  • Why does Ruby seem to be getting values for chr from the extended ASCII character set but ord from UTF-8?
  • Is there any way to tell Ruby to use different encodings when it uses these methods? For instance, tell it to use ASCII-8BIT encoding instead of whatever it's defaulting to?
  • If it is possible to change the default encoding, is there any way of getting the total number of characters available in the set being used?
like image 465
Jonathon Nordquist Avatar asked Jun 14 '16 19:06

Jonathon Nordquist


1 Answers

According to Integer#chr you can use the following to force the encoding to be UTF_8.

22909.chr(Encoding::UTF_8)
#=> "好"

To list all available encoding names

Encoding.name_list
#=> ["ASCII-8BIT", "UTF-8", "US-ASCII", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "UTF-16", "UTF-32", ...]

A hacky way to get the maximum number of characters

2000000.times.reduce(0) do |x, i|
  begin
    i.chr(Encoding::UTF_8)
    x += 1
  rescue
  end

  x
end
#=> 1112064
like image 145
Nabeel Avatar answered Sep 20 '22 13:09

Nabeel