Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split word in Ruby for counting

When I split a string "hello world /n" with

"hello world \n".scan(/\w+/)

I get ["hello", "world"]

I would like to count \n or \t as string as well .

like image 647
Prabesh Shrestha Avatar asked Apr 06 '11 07:04

Prabesh Shrestha


People also ask

How do you split a word in Ruby?

The general syntax for using the split method is string. split() . The place at which to split the string is specified as an argument to the method. The split substrings will be returned together in an array.

How do you count in Ruby?

Ruby | Array count() operationArray#count() : count() is a Array class method which returns the number of elements in the array. It can also find the total number of a particular element in the array. Syntax: Array. count() Parameter: obj - specific element to found Return: removes all the nil values from the array.

How do you count strings in Ruby?

Ruby | String count() Method In this method each parameter defines a set of characters to which is to be counted. The intersection of these sets defines the characters to count in the given string. Any other string which starts with a caret ^ is negated. Parameters: Here, str is the given string.

How do you get a substring in Ruby?

There is no substring method in Ruby, and hence we rely upon ranges and expressions. If we want to use the range, we have to use periods between the starting and ending index of the substring to get a new substring from the main string.


2 Answers

Do you want something like this?

"hello world \n".scan(/\w+|\n/)
like image 181
Dutow Avatar answered Sep 28 '22 14:09

Dutow


Do not use \w+ for counting words. It would separate numbers and words with Unicode like so:

"The floating point number is 13.5812".scan /\w+/
=> ["The", "floating", "point", "number", "is", "13", "5812"]

The same is true for numbers with other delimiters like "12,000".

In Ruby 1.8 the expression w+ worked with Unicode, this has changed. If there are Unicode characters in your string, the word will be separated, too.

"Die Apfelbäume".scan /\w+/
=> ["Die", "Apfelb", "ume"]

There are two options here.

  1. You want to skip numbers altogether. Fine, just use

    /\p{Letter}+/
    
  2. You don't want to skip numbers, because you want to count them as words, too. Then use

    /\S+/
    

    The expression \S+ will match on non-whitespace characters /[^ \t\r\n\f]/. The only disadvantage is, that your words will have other characters attached to them. Like brackets, hyphens, dots, etc. For the sole purpose of counting this should not be a problem.

    If you want to have the words, too. Then you would need to apply additional character stripping.

like image 27
Konrad Reiche Avatar answered Sep 28 '22 14:09

Konrad Reiche