Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MRI Internals: detailed explanation of rb_id2str

Tags:

ruby

In MRI, it appears that rb_id2str() is responsible for doing all of the work when you call Symbol#to_s. I was surprised to discover this is an extremely cryptic function for something that I assumed would be a fairly straight forward operation.

I'm looking for a detailed explanation of what this function is doing. For reference, here is a link to the source in 1.9.3:

http://rxr.whitequark.org/mri/source/parse.y?v=1.9.3-p195#9950

Some specific questions:

What are the four major if blocks doing?

  1. if (id < tLAST_TOKEN)
  2. if (id < INT_MAX && rb_ispunct((int)id))
  3. if (st_lookup(global_symbols.id_str, id, &data))
  4. if (is_attrset_id(id))

It would be great to get a generic overview of what each block of code inside the if statements does, but it doesn't need to be a line-by-line analysis.

Finally, I'm curious about the memory/garbage collection implications of to_s: does calling Symbol#to_s create a new string that has to be garbage collected every time, or is there something like internal copy-on-write optimization that uses a reference to the interned representation of the symbol up until a mutation is made to the string?

like image 660
Fitzsimmons Avatar asked Nov 08 '12 00:11

Fitzsimmons


1 Answers

For one thing, I'm pretty sure Symbol#to_s creates a new string. Most ruby classes are C structs, except for TrueClass, FalseClass, NilClass, Fixnum and Symbol, which are int in C. So Symbol is a whole different story with String(That's why Symbol is recommended unless you need to change the value a lot).

I'm not sure if you know about the book Ruby Hacking Guide, it explains a lot about how MRI is implemented in C.

FYI, Ruby Hacking Guide is written in Japanese, and till now there's still only a small part is translated, looks like guys have given up on it. http://rhg.rubyforge.org/

like image 178
Dean Winchester Avatar answered Oct 02 '22 12:10

Dean Winchester