I've been trying to get a standardized estimate of FLOPS across all of the computers that I've implemented a Python distributed processing program on. While I currently can calculate pystones quite fine, pystones are not particularly well known, and I'm not entirely sure how accurate they really are.
Thus, I need a way to calculate (or a module that already does it) FLOPS on a variety of machines, which may have any variety of CPU's, etc. Seeing as Python is an interpreted language, simply counting the time it takes to do a set number of operations won't perform on the level of, say, Linpack. While I don't particularly need to have the exact same estimates as one of the big 'names' in benchmarking, I'd like it to be reasonably close at least.
Thus, is there a way, or pre-existing module to allow me to get FLOPS? Otherwise, my only choice will be compiling into Cython, or trying to estimate the capabilities based on CPU clock speed...
log(n**2)*n**2-6*n**2 + 8 . It can be seen that this model is fairly close to the number of operations as captured by the CPU performance monitoring unit.
Isolate one loop iteration. Then count all simple floating-point additions, multiplications, divisions, etc. For example, y = x * 2 * (y + z*w) is 4 floating-point operations. Multiply the resulting number by the number of iterations.
To calculate the FLOPs in a model, here are the rules: Convolutions – FLOPs = 2x Number of Kernel x Kernel Shape x Output Shape. Fully Connected Layers – FLOPs = 2x Input Size x Output Size.
This script is designed to compute the theoretical amount of multiply-add operations in convolutional neural networks. It can also compute the number of parameters and print per-layer computational cost of a given network.
Linpack, or High performance linpack, is generally the industry standard for measuring flops. I found a python implementation here, but it might not be of much use, The standard implementation (especially if you have a cluster) would be to use the HPL. Unless you want to implement your own parallel linpack in python, HPL is the way to go. This is what most of those monster super computers on the top 500 list use to measure their performance
If you're really hell bent on doing this, even though it might not make sense or be of much use, You might want to look into porting the original MPI version to 0-MQ, which has a nice python interface.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With