Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simplest way to count words in a file

Tags:

scala

I'm trying to code in the simplest way a program to count word occurrences in file in Scala Language. So far I have these piece of code:

import scala.io.Codec.string2codec
import scala.io.Source
import scala.reflect.io.File

object WordCounter {
    val SrcDestination: String = ".." + File.separator + "file.txt"
    val Word = "\\b([A-Za-z\\-])+\\b".r

    def main(args: Array[String]): Unit = {

        val counter = Source.fromFile(SrcDestination)("UTF-8")
                .getLines
                .map(l => Word.findAllIn(l.toLowerCase()).toSeq)
                .toStream
                .groupBy(identity)
                .mapValues(_.length)

        println(counter)
    }
}

Don't bother of regexp expression. I would like to know how to extract single words from sequence retrieved in this line:

map(l => Word.findAllIn(l.toLowerCase()).toSeq)

in order to get each word occurency counted. Currently I'm getting map with counted words sequences.

like image 578
Dariusz Mydlarz Avatar asked Mar 18 '13 21:03

Dariusz Mydlarz


People also ask

What is the fastest way to count the number of words in a document?

To count the number of words in only part of your document, select the text you want to count. Then on the Tools menu, click Word Count. Just like the Word desktop program, Word for the web counts words while you type.

How do you count words in a folder?

Method 1: Check Details in File Folder Then open the file folder and click “More options” button next to “Change your view”. Next click “Details”. Now right click on the header row and choose “More”. In the “Choose Details” dialog box, check “Pages” and “Word count” boxes.

How do you count specific words in notepad?

Click View → Summary. Double-click on Length / Lines on the Status Bar (shortcut to Summary) Use TextFX → TextFX Tools → Word Count.


1 Answers

You can turn the file lines into words by splitting them with the regex "\\W+" (flatmap is lazy so it doesn't need to load the entire file into memory). To count occurrences you can fold over a Map[String, Int] updating it with each word (much more memory and time efficient than using groupBy)

scala.io.Source.fromFile("file.txt")
  .getLines
  .flatMap(_.split("\\W+"))
  .foldLeft(Map.empty[String, Int]){
     (count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
  }
like image 112
Garrett Hall Avatar answered Sep 21 '22 14:09

Garrett Hall