I am comparing performance for reading how many lines contains a file.
I did it first using the wc command line tool:
$ time wc -l bigFile.csv
1673820 bigFile.csv
real 0m0.157s
user 0m0.124s
sys 0m0.062s
and then in a clean Pharo Core Smalltalk latest 1.3
| file lineCount |
Smalltalk garbageCollect.
( Duration milliSeconds: [ file := FileStream readOnlyFileNamed: 'bigFile.csv'.
lineCount := 0.
[ file atEnd ] whileFalse: [
file nextLine.
lineCount := lineCount + 1 ].
file close.
lineCount. ] timeToRun ) asSeconds.
15
How can I speed up the Smalltalk code to be faster or closer than the wc performance?
[ (PipeableOSProcess waitForCommand: 'wc -l /path/to/bigfile2.csv') output ] timeToRun.
The above reports ~207 milliseconds, where time reported:
real 0m0.160s
user 0m0.131s
sys 0m0.029s
I'm kidding, but also serious. No need to reinvent the wheel. FFI, OSProcess, Zinc, etc. provide ample opportunity to utilize things like UNIX utilities that have been battle-tested over decades.
If your question was really more about Smalltalk itself, a start would be:
[ FileStream
readOnlyFileNamed: '/path/to/reallybigfile2.csv'
do: [ :file | | endings count |
count := 0.
file binary.
file contents do: [ :c | c = 10 ifTrue: [ count := count + 1 ] ].
count ]
] timeToRun.
That will get you down to 2.5 seconds:
A cleaner, but 1/2 second longer op would be:
file contents occurrencesOf: 10.
Of course, if better performance is needed, and you don't want to use FFI/OSProcess, you would then write a plugin.
If you can afford reading the whole file in memory, then the simplest code is
[ FileStream
readOnlyFileNamed: '/path/to/reallybigfile2.csv'
do: [ :file | file contents lineCount ]
] timeToRun.
This will handle the zoo of LF (Linux), CR (Old Mac), CR-LF (you name it). The code from Sean only handles LF, for approximately the same cost. I'd say a factor 10 for Smalltalk vs C is expected for such basic operations, so I doubt you get much more efficiency without adding your own primitives.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With