Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why and How to effectively test beta distributions of R as a normal user?

Tags:

r

beta-testing

This question is inspired by the remark of Duncan Murdoch on the r-devel mailing list in response to a bug report about Sweave :

This is fixed in R-patched. (It would have been fixed in 2.12.0 if more people tested the betas...).

Honestly, I've stayed away from beta -aka development- versions for a number of reasons, and these are reasons I hear from more people :

  1. I am a bit horrified it would somehow cause conflicts with my current R distribution. As I need it for work, having to repair it regularly would be a loss of time I can't explain to my boss
  2. I wouldn't have a clue how to test efficiently. I reckon every test I could come up with has already been run by the development team.
  3. I still find it difficult to figure out when something is a bug, and when (most often) it is my own stupidity kicking in.

But as I understood, it would be a valuable contribution to the R community, and I'm willing to do my bit of the testing as well if I can fit it somehow into my own work. I was thinking of keeping the beta on the side and running my scripts through it as well as a checkup. Saving the constructed objects allows a quick and easy all.equal() to see if something is wrong.

Anybody some more/better ideas on how I could help testing with a minimum amount of effort and a maximum amount of efficiency?

I'd also like to promote this a bit more on our department as well. Apart from the "It's time to give back to the community", any other good reasons why testing betas is worth the effort? How can I counter the arguments given above?

Edit:

As Dirk Eddelbuettel pointed out in the comments, part of the deal is preventing the path variables in Windows. I have some ideas on that, but pointers on how to practically organize your computer for testing R-devel versions are greatly appreciated as well.

like image 590
Joris Meys Avatar asked Nov 29 '10 16:11

Joris Meys


People also ask

Why do we use the beta distribution?

The most common use of this distribution is to model the uncertainty about the probability of success of a random experiment. In project management, a three-point technique called “beta distribution” is used, which recognizes the uncertainty in the estimation of the project time.

When beta distribution is normal?

A beta(a, b) distribution is approximately normal if the parameters a and b are large and approximately equal. A beta(a,b) distribution has mean a/(a+b) and variance ab/(a+b)2(a+b+1). When a=b, this reduces to mean 1/2 and variance 1/(8a + 4).

What does beta distribution tell us?

In short, the beta distribution can be understood as representing a probability distribution of probabilities- that is, it represents all the possible values of a probability when we don't know what that probability is.


2 Answers

I fear you misunderstand. This may not be straightforward or obvious at first so maybe this helps:

  • "patched" is not "beta". Patched is what R 2.12.1 will be.

  • There is no conflict. It drops in for 2.12.0.

  • It is a separate download, and a nightly build available from here.

  • This is not r-devel but r-patched.

  • It is our duty as users to test pre-releases as well. So if anything, in an ideal word you would have R-patched installed --- as well as R-devel!

  • Testing can be as easy as installing another version, keeping it outside your path and then adjusting PATH and R_HOME dynamicaly from a script. Testing means running it on your code and data to prevent you from getting bitten by bugs once the new code is released.

like image 138
Dirk Eddelbuettel Avatar answered Oct 04 '22 20:10

Dirk Eddelbuettel


I wouldn't have a clue how to test efficiently. I reckon every test I could come up with has already been run by the development team.

I still find it difficult to figure out when something is a bug, and when (most often) it is my own stupidity kicking in.

The problem is, software is not (or not only) going to be used by developers. It is going to be used by people that may not have programming knowledge at all (I'm speaking generally, this is valid for R as well as for any other software).

If the help or the interface or the general way the software is built do not give you enough informations on how to do something, well, that is maybe not a bug, but it is something that can be improved (and pointed out to the devs).

Also, remember that the developers wrote the software. They know how to use it and often they will be biased in testing it mainly by using it correctly and see if it gives the good result rather than by "trying to break it".

By using it in YOUR way (which may possibly be "uncorrect"), you are effectively running tests that maybe escaped the developers, just because they were not thinking of using it like you did.

like image 45
nico Avatar answered Oct 04 '22 22:10

nico