I'm trying to generalize a neural network function to arbitrarily many layers, and so I need multiple matrices to hold the weights for each neuron in each layer. I was originally explicitly declaring matrix objects in R to hold my weights for each layer. Instead of having one matrix per layer, I thought of a way (not saying it's original), to store all of my weights in a single array and defined an "indexing function" to map a weight to its appropriate index in the array. I defined the function as follows: <img src="https://i.stack.imgur.com/sQzQr.gif" alt=""> where <img src="https://i.stack.imgur.com/hqN6k.gif" alt=""> is the k-th weight of the j-th neuron in the i-th layer and L(r) is the number of neurons in layer r. After writing these definitions, I realize that stackoverflow doesn't allow latex like mathoverflow which is unfortunate. Now the question is: Is it more efficient to compute the index of my weights in this way, or is actually less efficient? After looking up how indices are computed for arrays in general, this is essentially what is done on compilation anyway if I just kept a matrix in each layer holding the weights, so it seems like I may just be making my code overly complicated and harder to understand if there's no difference in time efficiency.

There are many factors to take into consideration in each of the approaches. I'm not familiar with R but I'm assuming matrices' buffers are represented as one-dimensional arrays in memory. (Even if they are written as two dimensional arrays in the underlying C implementation the compiler stores it as one-dimensional array in memory) The overall outline of memory operations are: <ol> <li> Case: Several matrices per layers <ul> <li>Allocation of matrices: <img src="https://chart.googleapis.com/chart?cht=tx&chl=Cost_%7BmatrixAllocation%7D%20*%20N_%7Bmatrices%7D%20*%20N_%7Blayers%7D" alt="allocationmatrix"> </li> <li>Accessing of indices: <img src="https://chart.googleapis.com/chart?cht=tx&chl=Cost_%7BindexAccess%7D%20*%20N_%7Bweights%7D" alt="accesingindices"> </li> </ul> </li> <li> Case: One matrix for all layers + index calculation <ul> <li>Allocation of matrix cost: <img src="https://chart.googleapis.com/chart?cht=tx&chl=Cost_%7BmatrixAllocation" alt="onematrixallocation"> </li> <li>Accesing each of the indices cost: <img src="https://chart.googleapis.com/chart?cht=tx&chl=Cost_%7BindexAccess%7D%20*%20N_%7Bweights%7D" alt="accesingindicies"> </li> <li>Function cost: <img src="https://chart.googleapis.com/chart?cht=tx&chl=Cost_%7Bfunction%7D%20*%20N_%7Bweights%7D" alt="functioncost"> </li> </ul> </li> </ol> We can clearly see that the second case, scales better, even though there's the additional cost of the function call. Having said that, in general having a statically allocated array with all the weights for all the layers, should be faster. In most cases, computers's bottleneck is memory bandwidth, and the best way to counteract this is to minimize the number of memory accesses. With this in mind there's another more primitive reason why the 2nd approach will probably be faster: Caches. Here's a good explanation of the performance difference in accesing a two-dimensional array in a loop by Good Ol' Bob Martin TL; DR: Caches take advantage of the principle of locality, and therefore, having memory accesses spatially close to each other (as you would in one single array and accessing them in a cache-friendly way as explained in Bob Martin's answer) renders better performance than having them spatially separated (having them in several distinct arrays). PS: I also recommend to benchmark both approaches and compare, since these nuances regarding the cache are machine-dependent. It might be the case that the Dataset/NN is small enough to fit completely in RAM or even in cache? in a very powerful server.

Is calculating the index in an array more efficient than letting the compiler do it?

Tags:

performance

indexing

r

I'm trying to generalize a neural network function to arbitrarily many layers, and so I need multiple matrices to hold the weights for each neuron in each layer. I was originally explicitly declaring matrix objects in R to hold my weights for each layer. Instead of having one matrix per layer, I thought of a way (not saying it's original), to store all of my weights in a single array and defined an "indexing function" to map a weight to its appropriate index in the array.

I defined the function as follows:

where is the k-th weight of the j-th neuron in the i-th layer and L(r) is the number of neurons in layer r. After writing these definitions, I realize that stackoverflow doesn't allow latex like mathoverflow which is unfortunate. Now the question is: Is it more efficient to compute the index of my weights in this way, or is actually less efficient? After looking up how indices are computed for arrays in general, this is essentially what is done on compilation anyway if I just kept a matrix in each layer holding the weights, so it seems like I may just be making my code overly complicated and harder to understand if there's no difference in time efficiency.

614

asked Mar 05 '18 13:03

TheBluegrassMathematician

1 Answers

There are many factors to take into consideration in each of the approaches. I'm not familiar with R but I'm assuming matrices' buffers are represented as one-dimensional arrays in memory. (Even if they are written as two dimensional arrays in the underlying C implementation the compiler stores it as one-dimensional array in memory)

The overall outline of memory operations are:

Case: Several matrices per layers
- Allocation of matrices: $Cost_{matrixAllocation} * N_{matrices} * N_{layers}$
- Accessing of indices: $Cost_{indexAccess} * N_{weights}$
Case: One matrix for all layers + index calculation
- Allocation of matrix cost: $Cost_{matrixAllocation$
- Accesing each of the indices cost: $Cost_{indexAccess} * N_{weights}$
- Function cost: $Cost_{function} * N_{weights}$

We can clearly see that the second case, scales better, even though there's the additional cost of the function call.

Having said that, in general having a statically allocated array with all the weights for all the layers, should be faster.

In most cases, computers's bottleneck is memory bandwidth, and the best way to counteract this is to minimize the number of memory accesses.

With this in mind there's another more primitive reason why the 2nd approach will probably be faster: Caches.

Here's a good explanation of the performance difference in accesing a two-dimensional array in a loop by Good Ol' Bob Martin

TL; DR: Caches take advantage of the principle of locality, and therefore, having memory accesses spatially close to each other (as you would in one single array and accessing them in a cache-friendly way as explained in Bob Martin's answer) renders better performance than having them spatially separated (having them in several distinct arrays).

PS: I also recommend to benchmark both approaches and compare, since these nuances regarding the cache are machine-dependent. It might be the case that the Dataset/NN is small enough to fit completely in RAM or even in cache? in a very powerful server.

112

answered Nov 21 '22 15:11

chibby0ne

Related questions
                            
                                animation package cannot find ImageMagick with convert = "convert"
                            
                                na.locf converts data from numeric to character
                            
                                Using dplyr's do to perform bootstrap replications
                            
                                Create all possible combiations of 0,1, or 2 "1"s of a binary vector of length n
                            
                                How to spread out community graph made by using igraph package in R
                            
                                lm function in R does not give coefficients for all factor levels in categorical data
                            
                                Insert missing time rows into a dataframe
                            
                                Unexpected behaviour in indexing data.frame by row name
                            
                                What is Error in value[[3L]](cond) in R?
                            
                                Using dplyr and lazyeval with '...'
                            
                                Using do.call with dplyr standard evaluation version
                            
                                Plotting a number of inequalities as planes
                            
                                "zoom"/"scale" with coord_polar()
                            
                                data.table objects aren't updated in Rstudio environment panel
                            
                                how to correctly use labeller in facet_wrap
                            
                                Determine minimum R version for all package dependencies
                            
                                Using R package source files in packrat (rather than CRAN) with Travis-CI
                            
                                Using a pheatmap in arrangeGrob
                            
                                Using modelr::add_predictions for glm
                            
                                Use `j` to select the join column of `x` and all its non-join columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With