Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala : How to split words using multiple delimeters

Tags:

split

scala

Suppose I have the text file like this:

Apple#mango&banana@grapes

The data needs to be split on multiple delimiters before performing the word count.

How to do that?

like image 887
Ankita Avatar asked Aug 18 '17 13:08

Ankita


People also ask

How do you split words in Scala?

Use the split() Method to Split a String in Scala Scala provides a method called split() , which is used to split a given string into an array of strings using the delimiter passed as a parameter. This is optional, but we can also limit the total number of elements of the resultant array using the limit parameter.

Can you have multiple delimiters?

There are multiple ways you can split a string or strings of multiple delimiters in python. The most and easy approach is to use the split() method, however, it is meant to handle simple cases.

How do you split words in a string?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.

How to handle multiple delimiters in a string in Scala?

Use an Array for multiple delimiter characters. Split. Strings can contain sentences. But often they contain lists of structured data. Fields are separated by commas (or other characters). Split helps us process this data. With split, a Scala method that acts on StringLike values, we specify a delimiter or many delimiters.

How to split a string into an array in Scala?

This article will introduce the methods for splitting a string in the Scala programming language. Scala provides a method called split (), which is used to split a given string into an array of strings using the delimiter passed as a parameter.

How to use multiple delimiters to split a string in Java?

Sometimes a string may have more than one delimiter char. This becomes complex, but using multiple delimiters to split can help. Here We call split and pass an Array argument. The elements are the characters we want to split on (the delimiters). Result The various delimiters are handled correctly by split.

How do you handle multiple delimiter characters in a string?

Use an Array for multiple delimiter characters. Split. Strings can contain sentences. But often they contain lists of structured data. Fields are separated by commas (or other characters). Split helps us process this data.


2 Answers

Use split method:

scala> "Apple#mango&banana@grapes".split("[#&@]")
res0: Array[String] = Array(Apple, mango, banana, grapes)
like image 121
Alper t. Turker Avatar answered Nov 15 '22 11:11

Alper t. Turker


If you just want to count words, you don't need to split. Something like this will do:

  val numWords = """\b\w""".r.findAllIn(string).length

This is a regex that matches start of a word (\b is a (zero-length) word boundary, \w is any "word" character (letter, number or underscore), so you get all the matches in your string, and then just check how many there are.

If you are looking to count each word separately, and do it across multiple lines, then, split is, probably, a better option:

    source
      .getLines
      .flatMap(_.split("\\W+"))
      .filterNot(_.isEmpty)
      .groupBy(identity)
      .mapValues(_.size)
like image 36
Dima Avatar answered Nov 15 '22 10:11

Dima