I'm trying to code in the simplest way a program to count word occurrences in file in Scala Language. So far I have these piece of code:
import scala.io.Codec.string2codec
import scala.io.Source
import scala.reflect.io.File
object WordCounter {
val SrcDestination: String = ".." + File.separator + "file.txt"
val Word = "\\b([A-Za-z\\-])+\\b".r
def main(args: Array[String]): Unit = {
val counter = Source.fromFile(SrcDestination)("UTF-8")
.getLines
.map(l => Word.findAllIn(l.toLowerCase()).toSeq)
.toStream
.groupBy(identity)
.mapValues(_.length)
println(counter)
}
}
Don't bother of regexp expression. I would like to know how to extract single words from sequence retrieved in this line:
map(l => Word.findAllIn(l.toLowerCase()).toSeq)
in order to get each word occurency counted. Currently I'm getting map with counted words sequences.
To count the number of words in only part of your document, select the text you want to count. Then on the Tools menu, click Word Count. Just like the Word desktop program, Word for the web counts words while you type.
Method 1: Check Details in File Folder Then open the file folder and click “More options” button next to “Change your view”. Next click “Details”. Now right click on the header row and choose “More”. In the “Choose Details” dialog box, check “Pages” and “Word count” boxes.
Click View → Summary. Double-click on Length / Lines on the Status Bar (shortcut to Summary) Use TextFX → TextFX Tools → Word Count.
You can turn the file lines into words by splitting them with the regex "\\W+"
(flatmap
is lazy so it doesn't need to load the entire file into memory). To count occurrences you can fold over a Map[String, Int]
updating it with each word (much more memory and time efficient than using groupBy
)
scala.io.Source.fromFile("file.txt")
.getLines
.flatMap(_.split("\\W+"))
.foldLeft(Map.empty[String, Int]){
(count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With