Along with producing incorrect results, one of the worst fears in scientific programming is not being able to reproduce the results you've generated. What best practices help ensure your analysis is reproducible?
The reproducibility standard is based on the fact that every computational experiment has, in theory, a detailed log of every action taken by the computer.
B1: “Reproducibility” refers to instances in which the original researcher's data and computer codes are used to regenerate the results, while “replicability” refers to instances in which a researcher collects new data to arrive at the same scientific findings as a previous study.
Reproducibility is and has been a fundamental principle in the scientific method, the term refers to the ability to take the original researcher's data and analysis to generate the same results, the term is often used interchangeably with replicability, another fundamental principle in which a researcher collects new ...
Reproducibility is essential to science because it allows for more thorough research while replicability confirms our results. Many studies and experiments exist, leading to many different variables, unknowns, and things out of your control or that you cannot guarantee.
The first two points are incredibly important, because making the dataset available allows others to perform their own analyses on the same data, which increases the level of confidence in the validity of your own analyses. Additionally, making the dataset available online -- especially if you use linked data formats -- makes it possible for crawlers to aggregate your dataset with other datasets, thereby enabling analyses with larger data sets... in many types of research, the sample size is sometimes too small to be really confident about the results... but sharing your dataset makes it possible to construct very large datasets. Or, someone could use your dataset to validate the analysis that they performed on some other dataset.
Additionally, making your code open source makes it possible for the code and procedure to be reviewed by your peers. Often such reviews lead to the discovery of flaws or of the possibility for additional optimization and improvement. Most importantly, it allows other researchers to improve on your methods, without having to implement everything that you have already done from scratch. It very greatly accelerates the pace of research when researches can focus on just improvements and not on reinventing the wheel.
As for randomization... many algorithms rely on randomization to achieve their results. Stochastic and Monte Carlo methods are quite common, and while they have been proven to converge for certain cases, it is still possible to get different results. The way to ensure that you get the same results, is to have a loop in your code that invokes the computation some fixed number of times, and to choose the best result. If you use enough repititions, you can expect to find global or near-global optima instead of getting stuck in local optima. Another possibility is to use a predetermined seed, although that is not, IMHO, as good an approach since you could pick a seed that causes you to get stuck in local optima. In addition, there is no guarantee that random number generators on different platforms will generate the same results for that seed value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With