In MRI, it appears that rb_id2str()
is responsible for doing all of the work when you call Symbol#to_s
. I was surprised to discover this is an extremely cryptic function for something that I assumed would be a fairly straight forward operation.
I'm looking for a detailed explanation of what this function is doing. For reference, here is a link to the source in 1.9.3:
http://rxr.whitequark.org/mri/source/parse.y?v=1.9.3-p195#9950
Some specific questions:
What are the four major if
blocks doing?
if (id < tLAST_TOKEN)
if (id < INT_MAX && rb_ispunct((int)id))
if (st_lookup(global_symbols.id_str, id, &data))
if (is_attrset_id(id))
It would be great to get a generic overview of what each block of code inside the if statements does, but it doesn't need to be a line-by-line analysis.
Finally, I'm curious about the memory/garbage collection implications of to_s
: does calling Symbol#to_s
create a new string that has to be garbage collected every time, or is there something like internal copy-on-write optimization that uses a reference to the interned representation of the symbol up until a mutation is made to the string?
For one thing, I'm pretty sure Symbol#to_s creates a new string. Most ruby classes are C structs, except for TrueClass, FalseClass, NilClass, Fixnum and Symbol, which are int in C. So Symbol is a whole different story with String(That's why Symbol is recommended unless you need to change the value a lot).
I'm not sure if you know about the book Ruby Hacking Guide, it explains a lot about how MRI is implemented in C.
FYI, Ruby Hacking Guide is written in Japanese, and till now there's still only a small part is translated, looks like guys have given up on it. http://rhg.rubyforge.org/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With