Imagine 10 cars randomly, uniformly distributed on a round track of length 1. If the positions are represented by a C double in the range [0,1> then they can be sorted and the gaps between the cars should be the position of the car in front minus the position of the car behind. The last gap needs 1 added to account for the discontinuity. In the program output, the last column has very different statistics and distribution from the others. The rows correctly add to 1. What's going on? <pre class="prettyprint lang-c prettyprint-override"><code>#include <stdio.h> #include <stdlib.h> #include <stdint.h> int compare (const void * a, const void * b) { if (*(double*)a > *(double*)b) return 1; else if (*(double*)a < *(double*)b) return -1; else return 0; } double grand_f_0_1(){ static FILE * fp = NULL; uint64_t bits; if(fp == NULL) fp = fopen("/dev/urandom", "r"); fread(&bits, sizeof(bits), 1, fp); return (double)bits * 5.421010862427522170037264004349e-020; // https://stackoverflow.com/a/26867455 } int main() { const int n = 10; double values[n]; double diffs[n]; int i, j; for(j=0; j<10000; j++) { for(i=0; i<n; i++) values[i] = grand_f_0_1(); qsort(values, n, sizeof(double), compare); for(i=0; i<(n-1); i++) diffs[i] = values[i+1] - values[i]; diffs[n-1] = 1. + values[0] - values[n-1]; for(i=0; i<n; i++) printf("%.5f%s", diffs[i], i<(n-1)?"\t":"\n"); } return(0); } </code></pre> Here is a sample of the output. The first column represents the gap between the first and second car. The last column represents the gap between 10th car and the first car, across the start/finish line. Large numbers like .33 and .51 are much more common in the last column and very small numbers are relatively rare. <pre class="prettyprint"><code>0.13906 0.14241 0.24139 0.29450 0.01387 0.07906 0.02905 0.03160 0.00945 0.01962 0.01826 0.36875 0.04377 0.05016 0.05939 0.02388 0.10363 0.04640 0.03538 0.25037 0.04496 0.05036 0.00536 0.03645 0.13741 0.00538 0.24632 0.04452 0.07750 0.35176 0.00271 0.15540 0.03399 0.05654 0.00815 0.01700 0.24275 0.25494 0.00206 0.22647 0.34420 0.03226 0.01573 0.08597 0.05616 0.00450 0.05940 0.09492 0.05545 0.25141 0.18968 0.34749 0.07375 0.01481 0.01027 0.00669 0.04306 0.00279 0.08349 0.22796 0.16135 0.02824 0.07965 0.11255 0.05570 0.05550 0.05575 0.05586 0.07156 0.32385 0.12799 0.18870 0.04153 0.16590 0.02079 0.06612 0.08455 0.14696 0.13088 0.02659 0.00810 0.06335 0.13014 0.06803 0.01878 0.10119 0.00199 0.06656 0.20922 0.33263 0.00715 0.03261 0.05779 0.47221 0.13998 0.11044 0.06397 0.00238 0.04157 0.07190 0.33703 0.02945 0.06164 0.01555 0.03444 0.14547 0.02342 0.03804 0.16088 0.15407 0.10912 0.14419 0.04340 0.09204 0.23033 0.09240 0.14530 0.00960 0.03412 0.09950 0.20165 0.09222 0.04268 0.17820 0.19159 0.02074 0.05634 0.00237 0.09559 0.11863 0.09296 0.01148 0.20442 0.07070 0.05221 0.04591 0.08455 0.25799 0.01417 0.16561 0.08846 0.07075 0.03732 0.11721 0.03095 0.24329 0.06630 0.06655 0.08060 0.19857 0.06225 0.10971 0.10978 0.01369 0.13479 0.17539 0.17540 0.02690 0.00464 0.18744 0.09431 0.10851 0.05079 0.07846 0.00162 0.00463 0.06533 0.18752 0.30896 0.09986 0.23214 0.11937 0.10215 0.04040 0.02876 0.00979 0.02443 0.21859 0.15627 0.06811 0.04522 0.07920 0.02432 0.01949 0.03837 0.10967 0.11123 0.01490 0.03846 0.51915 0.13486 0.02961 0.00818 0.11947 0.17204 0.08967 0.09767 0.03349 0.08077 0.23426 </code></pre>

Your code is ok. The mean value of the last difference it two times larger than the others. The paradox comes from the fact is that rather than selecting 10 points on a unit interval one actually tries to divide it into 11 sub-intervals with 10 cuts.Therefore the expected length of each sub-interval is 1/11. The difference between consecutive points is approaching 1/11 except the last pair because it contains the last sub-interval (between the last point and point 1) and the first one (between point 0 and the first point). Thus the mean of the last difference is 2/11.

<h3>"There is no special points on the circle"</h3> The thing is that, on a circle, one car looks always the same, and so there is no need to relate to zero: you can just relate to the first car. This means that you can fix the first car at zero, and treat the random positions of the other cars as related to it (measured from it). And so, the convenient solution is to fix the first car at zero and think of the 9 numbers you still generate as positions related to the first one. Hope it's a satisfying answer :-) <h3>IDENTITY (or Which diff is first?)</h3> If 10 cars with labels ("1","2" and so on) are placed randomly on a circle, the difference from the "1" to the next will average 1/10. While sorting, the first diff "loses it's identity", what it is changes: it's similar to that if you chose the 1st diff to be the longest one, it would average more. Choosing it based on relation of cars to zero skews (or, in nicer terms: changes) things in a similar manner. The first difference (2nd, 3rd etc.) just becomes something different – defining it as a difference from a given car is more intuitive, and gives an option to use it as a reference (playing nicely with circle symmetry); the distribution of the rest of cars with respect to it is a uniform one. Dealing with the smallest of random points is not that simple. Summary: define what you're calculating, know your definitions and probability is non-intuitive

Uniform Distribution: Bug or Paradox

Q: What is the uniform distribution notation for probability density function?

The uniform distribution notation for the same is A &sim; &sim; U (x,y) where x = the lowest value of a and y = the highest value of b. f (a) = 1/ (y-x), f (a) = the probability density function.

Tags:

c

random

debugging

uniform-distribution

paradox

Imagine 10 cars randomly, uniformly distributed on a round track of length 1. If the positions are represented by a C double in the range [0,1> then they can be sorted and the gaps between the cars should be the position of the car in front minus the position of the car behind. The last gap needs 1 added to account for the discontinuity.

In the program output, the last column has very different statistics and distribution from the others. The rows correctly add to 1. What's going on?

Click to copy

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int compare (const void * a, const void * b)
{
  if (*(double*)a > *(double*)b) return 1;
  else if (*(double*)a < *(double*)b) return -1;
  else return 0;
}

double grand_f_0_1(){
  static FILE * fp = NULL;
  uint64_t bits;

  if(fp == NULL) fp = fopen("/dev/urandom", "r");
  fread(&bits, sizeof(bits), 1, fp);
  return (double)bits * 5.421010862427522170037264004349e-020; // https://stackoverflow.com/a/26867455
}

int main()
{
  const int n = 10;
  double values[n];
  double diffs[n];
  int i, j;
  
  for(j=0; j<10000; j++) {
  
    for(i=0; i<n; i++) values[i] = grand_f_0_1();

    qsort(values, n, sizeof(double), compare);
    
    for(i=0; i<(n-1); i++) diffs[i] = values[i+1] - values[i];
    diffs[n-1] = 1. + values[0] - values[n-1];

    for(i=0; i<n; i++) printf("%.5f%s", diffs[i], i<(n-1)?"\t":"\n");

  }
  
  return(0);
}

Here is a sample of the output. The first column represents the gap between the first and second car. The last column represents the gap between 10th car and the first car, across the start/finish line. Large numbers like .33 and .51 are much more common in the last column and very small numbers are relatively rare.

Click to copy

0.13906 0.14241 0.24139 0.29450 0.01387 0.07906 0.02905 0.03160 0.00945 0.01962
0.01826 0.36875 0.04377 0.05016 0.05939 0.02388 0.10363 0.04640 0.03538 0.25037
0.04496 0.05036 0.00536 0.03645 0.13741 0.00538 0.24632 0.04452 0.07750 0.35176
0.00271 0.15540 0.03399 0.05654 0.00815 0.01700 0.24275 0.25494 0.00206 0.22647
0.34420 0.03226 0.01573 0.08597 0.05616 0.00450 0.05940 0.09492 0.05545 0.25141
0.18968 0.34749 0.07375 0.01481 0.01027 0.00669 0.04306 0.00279 0.08349 0.22796
0.16135 0.02824 0.07965 0.11255 0.05570 0.05550 0.05575 0.05586 0.07156 0.32385
0.12799 0.18870 0.04153 0.16590 0.02079 0.06612 0.08455 0.14696 0.13088 0.02659
0.00810 0.06335 0.13014 0.06803 0.01878 0.10119 0.00199 0.06656 0.20922 0.33263
0.00715 0.03261 0.05779 0.47221 0.13998 0.11044 0.06397 0.00238 0.04157 0.07190
0.33703 0.02945 0.06164 0.01555 0.03444 0.14547 0.02342 0.03804 0.16088 0.15407
0.10912 0.14419 0.04340 0.09204 0.23033 0.09240 0.14530 0.00960 0.03412 0.09950
0.20165 0.09222 0.04268 0.17820 0.19159 0.02074 0.05634 0.00237 0.09559 0.11863
0.09296 0.01148 0.20442 0.07070 0.05221 0.04591 0.08455 0.25799 0.01417 0.16561
0.08846 0.07075 0.03732 0.11721 0.03095 0.24329 0.06630 0.06655 0.08060 0.19857
0.06225 0.10971 0.10978 0.01369 0.13479 0.17539 0.17540 0.02690 0.00464 0.18744
0.09431 0.10851 0.05079 0.07846 0.00162 0.00463 0.06533 0.18752 0.30896 0.09986
0.23214 0.11937 0.10215 0.04040 0.02876 0.00979 0.02443 0.21859 0.15627 0.06811
0.04522 0.07920 0.02432 0.01949 0.03837 0.10967 0.11123 0.01490 0.03846 0.51915
0.13486 0.02961 0.00818 0.11947 0.17204 0.08967 0.09767 0.03349 0.08077 0.23426

578

asked Aug 27 '21 17:08

APA

Video Answer

2 Answers

Your code is ok. The mean value of the last difference it two times larger than the others.

The paradox comes from the fact is that rather than selecting 10 points on a unit interval one actually tries to divide it into 11 sub-intervals with 10 cuts.Therefore the expected length of each sub-interval is 1/11.

The difference between consecutive points is approaching 1/11 except the last pair because it contains the last sub-interval (between the last point and point 1) and the first one (between point 0 and the first point).

Thus the mean of the last difference is 2/11.

176

answered Oct 29 '22 21:10

tstanisl

"There is no special points on the circle"

The thing is that, on a circle, one car looks always the same, and so there is no need to relate to zero: you can just relate to the first car. This means that you can fix the first car at zero, and treat the random positions of the other cars as related to it (measured from it).

And so, the convenient solution is to fix the first car at zero and think of the 9 numbers you still generate as positions related to the first one.

Hope it's a satisfying answer :-)

IDENTITY (or Which diff is first?)

If 10 cars with labels ("1","2" and so on) are placed randomly on a circle, the difference from the "1" to the next will average 1/10. While sorting, the first diff "loses it's identity", what it is changes: it's similar to that if you chose the 1st diff to be the longest one, it would average more. Choosing it based on relation of cars to zero skews (or, in nicer terms: changes) things in a similar manner.

The first difference (2nd, 3rd etc.) just becomes something different – defining it as a difference from a given car is more intuitive, and gives an option to use it as a reference (playing nicely with circle symmetry); the distribution of the rest of cars with respect to it is a uniform one. Dealing with the smallest of random points is not that simple.

Summary: define what you're calculating, know your definitions and probability is non-intuitive

answered Oct 29 '22 20:10

wojand

Related questions
                            
                                Accessing two discontinuous memory blocks as a single continuous block, in C?
                            
                                What do f_bsize and f_frsize in struct statvfs stand for?
                            
                                How to get execution time of c program?
                            
                                Subtracting NULL pointer from a normal pointer generates arithmetic right shift
                            
                                How to get the gcc compiler to not optimize a standard library function call like printf?
                            
                                Draw border (frame) using xlib
                            
                                Inconsistent C99 support in gcc and clang
                            
                                Reliable type-punning across C and C++ standards
                            
                                When returning the difference between pointers of char strings, how important is the order of casting and dereferencing?
                            
                                How to distinguish armhf (ARMv7) and armel (ARMv4) in C code?
                            
                                Where should function attributes go?
                            
                                How does this macro detect alignment issues?
                            
                                When should I use hypot over sqrtl?
                            
                                How do I print a floating-point value for later scanning with perfect accuracy?
                            
                                GDB shows incorrect arguments of functions for stack frames
                            
                                How to prove the functionality of a C stringCompare function with Frama-C?
                            
                                How to make OpenSSL C server only support TLS 1.3?
                            
                                How is it possible to create an array using register storage class in C?
                            
                                C - Do incompatible pointers used for arithmetic violate strict aliasing?
                            
                                Why are const qualified variables accepted as initializers on gcc?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With