I need my R scripts to always produce the same results for the same input. However, I've noticed that the function logspline
from the package logspline
produces different output on windows OS and ubuntu OS. Both uses logspline
package 2.1.16, though.
Here's the code:
x <- c(0.3453205379,0.3497529927,0.3460179029,0.3433414591,0.3565925053,0.3318019585,0.3322091870,0.3314076990,0.3413768315,0.3305650805,0.3342775671,0.3362692445,0.3321054345,0.5982416984,0.5509602046,0.6600000000,0.3339818725,0.3307459063,0.3314632807,0.3356930476,0.3300000000,0.3324504116,0.3470739551,0.3441385006,0.3316070520,0.3399635743,0.3316989471,0.3308044524,0.3536822479,0.3315414656)
fnc <- logspline(x)
plot.logspline(fnc)
and here's the result on windows - fitted with 7 knots: and here the result on ubuntu - fitted with 5 knots:
Could someone explain me the reason of the difference, please? And is there a way to force the function to produce consistent results in any environment?
Output logspline(x, error.action=0)
on windows:
knots A(1)/D(2) loglik AIC minimum penalty maximum penalty
4 2 40.53 -70.85 113.83 Inf
5 2 97.44 -181.28 5.48 113.83
6 2 99.48 -181.96 NA NA
7 2 102.92 -185.44 -0.13 5.48
8 1 102.86 -181.90 NA NA
the present optimal number of knots is 7
penalty(AIC) was the default: BIC=log(samplesize): log( 30 )= 3.4
Output logspline(x, error.action=0)
on linux:
knots A(1)/D(2) loglik AIC minimum penalty maximum penalty
4 2 87.33 -164.46 8.38 Inf
5 2 91.52 -169.44 2.07 8.38
6 1 91.95 -166.88 NA NA
7 1 93.60 -166.78 0.00 2.07
8 1 93.60 -163.38 NA NA
the present optimal number of knots is 5
penalty(AIC) was the default: BIC=log(samplesize): log( 30 )= 3.4
Output sessionInfo()
on windows:
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] logspline_2.1.16
loaded via a namespace (and not attached):
[1] compiler_4.1.0 tools_4.1.0
Output sessionInfo()
on linux:
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] logspline_2.1.16
loaded via a namespace (and not attached):
[1] compiler_4.1.0 tools_4.1.0
Not a complete answer, but this was way too long for a comment.
I tried different systems and this is what I found:
rocker/r-base:4.1.0
returns 5 knots (gcc
and gfortran
version 10.2.1 LAPACK
openblas-pthread/libopenblasp-r0.3.13)rocker/r-base:3.6.3
returns 5 knots (gcc
and gfortran
version 9.2.1-30 LAPACK
lapack/liblapack.so.3.9.0)gcc
and gfortran
version 10.3.0 (9.3.0 for ubuntu:20.04) LAPACK
lapack/liblapack.so.3.9.0)gcc
, gfortran
, LAPACK
version was used)ubuntu:18.04
with R 4.1.0 returns 7 knots (gcc
and gfortran
version 7.5.0, LAPACK
version liblapack.so.3.7.1, alternatively tested with /atlas/liblapack.so.3.10.3, same results)gcc
and gfortran
version 7.5.0 LAPACK
libopenblasp-r0.2.20)The package logspline uses multiple fortran and C functions, one of them being nlogcensor
. The code below calculates many things, among others the nknots
.
MWE for the internal code in logspline::logspline
(line 483ff of R/logspline.R
in version 0.2.8 (github), for the CRAN version (current 0.2.16) it is line 486ff).
logspline()
calls in nlogcensor()
a C function which is defined in src/nlsd.c
at line 166ff (latest version line 167ff).
library(logspline)
intpars <- c(30L, 0L, 0L, 0L, 1L, 1L, -1L)
data <- c(0.33, 0.3305650805, 0.3307459063, 0.3308044524, 0.331407699,
0.3314632807, 0.3315414656, 0.331607052, 0.3316989471, 0.3318019585,
0.3321054345, 0.332209187, 0.3324504116, 0.3339818725, 0.3342775671,
0.3356930476, 0.3362692445, 0.3399635743, 0.3413768315, 0.3433414591,
0.3441385006, 0.3453205379, 0.3460179029, 0.3470739551, 0.3497529927,
0.3536822479, 0.3565925053, 0.5509602046, 0.5982416984, 0.66,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0)
dpars <- c(-1, 0, 0)
maxp <- 65
kts <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0)
# actual code
z <- .C("nlogcensor",
ip = as.integer(intpars),
coef = as.double(data),
dp = as.double(dpars),
logl = as.double(rep(0, maxp)),
ad = as.integer(rep(0, maxp)),
kts = as.double(kts),
PACKAGE = "logspline")
z$ip[2]
#> 7 some older systems, 5 for newer systems
This narrows down the source of error. And this is where my current journey ends. Unfortunately, I don't have much C experience and no FORTRAN experience.
I tried to follow the C routines, but the code is little formatted and hard to read for me. In line 328 of nlsd.c
(version 0.2.16 line 335), the final nknots
is assigned as intpars[1]=(*spc).nk;
(C is zero indexed, R is 1 indexed).
Maybe you can reprodue the error using the nlogcensor
code call and confirm different different Fortran or LAPACK toolchains.
As can be seen from the text above, there is no simple way to make the package fully reproducible. That is, set.seed(123)
and controlling the version of R and the package itself does not help as I suspect the toolchain version is different and leads to the differences.
The results from the different tests above suggest, that if you have a liblapack
version >= 3.9.0 (or libopenblasp
>= r0.3.13), you get 5 knots. Older versions lead to 7 knots.
As you usually have little control over this, you might want to look into docker and use a specified version, e.g., rocker/r-base:4.1.0
. This might be a bit of an overkill for this problem (not sure how or for what you use the code), but this is the only way I currently see you can achieve full reproducibility as you have control over the toolchain as well.
Hope that clears your problem a bit or points you to a potential solution.
In case you want to replicate the results:
For the rocker/r-base
images, just use docker run --rm -it rocker/r-base:4.1.0
and then within R install.packages("logspline")
and the MWE code from above.
The full install script for ubuntu
is a bit more complicated, see below:
docker run --rm -it ubuntu:18.04
# within the container
# install dependencies
apt update && apt upgrade && \
apt install --no-install-recommends -y software-properties-common dirmngr wget build-essential gfortran
# install r-base
wget -qO- https://cran.rstudio.com/bin/linux/ubuntu/marutter_pubkey.asc | \
tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc && \
add-apt-repository "deb https://cran.rstudio.com/bin/linux/ubuntu $(lsb_release -cs)-cran40/" && \
apt install --no-install-recommends -y r-base
# install lapack and blas
apt install libblas-dev liblapack-dev
# alternatively use this: libatlas-base-dev
# install logspline
R -e "install.packages('logspline', repos='https://cran.rstudio.com')"
# see versions
gcc --version
gfortran --version
R -e "sessionInfo()" | grep usr
# run R
R
and then use the code from the MWE above.
The number of knots should be the same across machines to test if there is a difference. You must keep the number of knots the same because this will also change the statistical inference as well.
x <- c(0.3453205379,0.3497529927,0.3460179029,0.3433414591,0.3565925053,0.3318019585,0.3322091870,0.3314076990,0.3413768315,0.3305650805,0.3342775671,0.3362692445,0.3321054345,0.5982416984,0.5509602046,0.6600000000,0.3339818725,0.3307459063,0.3314632807,0.3356930476,0.3300000000,0.3324504116,0.3470739551,0.3441385006,0.3316070520,0.3399635743,0.3316989471,0.3308044524,0.3536822479,0.3315414656)
fnc1 <- logspline(x, nknots = 5)
plot.logspline(fnc1)
fnc2 <- logspline(x, nknots = 7)
plot.logspline(fnc2)
See I am testing only on windows but when you change the number of knots the graphics change.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With