Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do powers of 10 print in scientific notation at the 5th power?

I would like to know if and how the powers of 10 are related to the printing of scientific notation in the console. I've searched R docs and haven't found anything relevant, or that I really understand.

First off, my scipen and digits settings are

unlist(options("scipen", "digits")) # scipen digits  #      0      7  

Now, powers of 10 are printed normally up to the 4th power, and then printing switches to scientific notation at the 5th power.

10^(1:4) # [1]    10   100  1000 10000 10^(1:5) # [1] 1e+01 1e+02 1e+03 1e+04 1e+05 

Interestingly, this does not happen for some other numbers larger than 10.

11^(1:5) # [1]     11    121   1331  14641 161051 

Judging from the following, 5 digits seem significant.

100^(1:2) # [1]   100 10000 100^(1:3) # [1] 1e+02 1e+04 1e+06 

So my questions then are:

Why is scientific notation activated between the 4th and 5th power for 10 and not for other numbers? Is the number 5 significant? Furthermore, why 5 and not a number closer to the maximum digits option of 22?

like image 866
Rich Scriven Avatar asked Sep 16 '14 02:09

Rich Scriven


People also ask

What does the power of 10 represent in scientific notation?

A power of 10 is as many number 10s as indicated by the exponent multiplied together. Thus, shown in long form, a power of 10 is the number 1 followed by n zeros, where n is the exponent and is greater than 0; for example, 106 is written 1,000,000.

Why are powers of 10 used in science?

Powers of 10 are used to estimate and reduce the size of large numbers to make them easier to work with.


2 Answers

Well, the answer is actually there in the definition of scipen in ?options, although it's pretty hard to understand what it means without playing around with some examples:

‘scipen’: integer. A penalty to be applied when deciding to print numeric values in fixed or exponential notation. Positive values bias towards fixed and negative towards scientific notation: fixed notation will be preferred unless it is more than ‘scipen’ digits wider.

To see what that means, examine the following three pairs of exactly identical numbers. In the first two cases, the width in characters of the fixed notation that is less than or equal to the width of the scientific, so fixed notation is preferred.

In the third case, though, the fixed notation is wider (i.e. "more than 0 digits wider"), because the 5 zeros amount to more characters than the 4 characters used to represent the same value using e+nn. As a result, in that case scientific notation is preferred.

1e+03 1000 # [1] 1000  1e+04 10000 # [1] 10000  1e+05 100000      ## <- wider # [1] 1e+05 

Next, examine some numbers that also end with lots of zeros, but whose representation in scientific notation will require use of a .. For these numbers, scientific notation will be used once you have 6 or more zeros (i.e. more than the 5 characters taken up by one . and the characters e+nn).

1.1e+06 1100000 # [1] 1100000   1.1e+07 11000000     ##  <- wider # [1] 1.1e+07 

Reasoning about the tradeoff gets a bit trickier for most other numbers, for which the values of both options("scipen") and options("digits") come into play, but the general idea is exactly the same.

To see some of the slightly surprising complications that come into play, you might want to paste the following into your console (perhaps after first trying to predict where within each series the switch to scientific notation will occur).

100001 1000001 10000001 100000001 1000000001 10000000001 100000000001 1000000000001  111111 1111111 11111111 111111111 1111111111 11111111111 111111111111 1111111111111 
like image 195
Josh O'Brien Avatar answered Sep 20 '22 13:09

Josh O'Brien


I'm confused as to what exactly is your question; or, more specially, how you would use an answer to this question to somehow change/control the behavior of R. You you trying to format numbers a certain way? There are better ways to do that.

When you type values like that, the results are implicitly run though one of the print() commands to be formatted "nicely" to the console. Whenever things have to look "nice" on screen, the code to do that is often ugly. Here most of the that code is taken care of by the formatReal function, and the helper scientific function. The latter tracks the following information for a number

/* for a number x , determine  *  sgn    = 1_{x < 0}  {0/1}  *  kpower = Exponent of 10;  *  nsig   = min(R_print.digits, #{significant digits of alpha})  *  roundingwidens = 1 if rounding causes x to increase in width, 0 otherwise  *  * where  |x| = alpha * 10^kpower   and  1 <= alpha < 10  */ 

Then the former function uses this information to try to make "nice" looking numbers by balancing values to the left and the right of the decimal place. It's a combination of many things like the order of magnitude of the number and the number of significant digits as well as environmental influences form the scipen option, etc.

print() is only meant to make things look "nice." What exactly is nice depends on all the values in a vector. You'll find few hard cutoffs in that code; it's very adaptive. There is no easy way to concisely describe everything it does in the general case (which is what it sounds like you are asking for).

The only thing that is certain is that if you need to have your numbers formatted in a certain way, use a function like sprintf() or formatC() that allows for precise control.

Of course this behavior is dependent on class() and i've pointed the the formatReal stuff since that's where most tricky things happen. But observe the difference when you use integers

c(10, 100, 1000, 10000, 100000) # [1] 1e+01 1e+02 1e+03 1e+04 1e+05 c(10L, 100L, 1000L, 10000L, 100000L) # [1]     10    100   1000  10000 100000 
like image 24
MrFlick Avatar answered Sep 19 '22 13:09

MrFlick