Suppose I have the text file like this:
Apple#mango&banana@grapes
The data needs to be split on multiple delimiters before performing the word count.
How to do that?
Use the split() Method to Split a String in Scala Scala provides a method called split() , which is used to split a given string into an array of strings using the delimiter passed as a parameter. This is optional, but we can also limit the total number of elements of the resultant array using the limit parameter.
There are multiple ways you can split a string or strings of multiple delimiters in python. The most and easy approach is to use the split() method, however, it is meant to handle simple cases.
The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.
Use an Array for multiple delimiter characters. Split. Strings can contain sentences. But often they contain lists of structured data. Fields are separated by commas (or other characters). Split helps us process this data. With split, a Scala method that acts on StringLike values, we specify a delimiter or many delimiters.
This article will introduce the methods for splitting a string in the Scala programming language. Scala provides a method called split (), which is used to split a given string into an array of strings using the delimiter passed as a parameter.
Sometimes a string may have more than one delimiter char. This becomes complex, but using multiple delimiters to split can help. Here We call split and pass an Array argument. The elements are the characters we want to split on (the delimiters). Result The various delimiters are handled correctly by split.
Use an Array for multiple delimiter characters. Split. Strings can contain sentences. But often they contain lists of structured data. Fields are separated by commas (or other characters). Split helps us process this data.
Use split
method:
scala> "Apple#mango&banana@grapes".split("[#&@]")
res0: Array[String] = Array(Apple, mango, banana, grapes)
If you just want to count words, you don't need to split. Something like this will do:
val numWords = """\b\w""".r.findAllIn(string).length
This is a regex that matches start of a word (\b
is a (zero-length) word boundary, \w
is any "word" character (letter, number or underscore), so you get all the matches in your string, and then just check how many there are.
If you are looking to count each word separately, and do it across multiple lines, then, split
is, probably, a better option:
source
.getLines
.flatMap(_.split("\\W+"))
.filterNot(_.isEmpty)
.groupBy(identity)
.mapValues(_.size)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With