Is anyone able to pinpoint which algorithm is used for the include? method in Ruby? For example
"helloworld".include?("hello")
A string in Ruby is an object (like most things in Ruby). You can create a string with either String::new or as literal (i.e. with the double quotes "" ). But you can also create string with the special %() syntax With the percent sign syntax, the delimiters can be any special character.
In Ruby, string is a sequence of one or more characters. It may consist of numbers, letters, or symbols. Here strings are the objects, and apart from other languages, strings are mutable, i.e. strings can be changed in place instead of creating new strings.
index is a String class method in Ruby which is used to returns the index of the first occurrence of the given substring or pattern (regexp) in the given string. It specifies the position in the string to begin the search if the second parameter is present. It will return nil if not found.
As emboss states in his answer, String#include
calls rb_str_index
. This function in turn calls rb_memsearch
, which implements the Rabin-Karp string search algorithm, according to this post on ruby-forum.com
.
The Ruby Language Specification doesn't prescribe any particular algorithm. Every implementation can use whatever algorithm they want.
For example, in Rubinius, String#include?
calls String#find_string
:
def include?(needle)
if needle.kind_of? Fixnum
needle = needle % 256
str_needle = needle.chr
else
str_needle = StringValue(needle)
end
!!find_string(str_needle, 0)
end
String#find_string
in turn is implemented via the string_index
primitive:
def find_string(pattern, start)
Rubinius.primitive :string_index
raise PrimitiveFailure, "String#find_string failed"
end
The string_index
primitive is implemented by the rubinius::String::index
function:
// Rubinius.primitive :string_index
Fixnum* index(STATE, String* pattern, Fixnum* start);
rubinius::String::index
:
Fixnum* String::index(STATE, String* pattern, Fixnum* start) {
native_int total = size();
native_int match_size = pattern->size();
if(start->to_native() < 0) {
Exception::argument_error(state, "negative start given");
}
switch(match_size) {
case 0:
return start;
case 1:
{
uint8_t* buf = byte_address();
uint8_t matcher = pattern->byte_address()[0];
for(native_int pos = start->to_native(); pos < total; pos++) {
if(buf[pos] == matcher) return Fixnum::from(pos);
}
}
return nil<Fixnum>();
default:
{
uint8_t* buf = byte_address();
uint8_t* matcher = pattern->byte_address();
uint8_t* last = buf + (total - match_size);
uint8_t* pos = buf + start->to_native();
while(pos <= last) {
// Checking *pos directly then also checking memcmp is an
// optimization. It's about 10x faster than just calling memcmp
// everytime.
if(*pos == *matcher &&
memcmp(pos, matcher, match_size) == 0) {
return Fixnum::from(pos - buf);
}
pos++;
}
}
return nil<Fixnum>();
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With