I've written an algorithm that generates a uniform random float in the range [a,b] that can produce every representable floating-point value with a probability proportional to the covered real number range*: https://github.com/camel-cdr/cauldron/blob/main/cauldron/ran...
* Well, almost. I messed up the probability around subnormals slightly and haven't gotten around to fixing it yet.
Still, the algorithm it self should works for all other values and it was measurably more accurate than the regular algorithm.
In terms of performance it is 10x slower compared to the regular implementation of `(randu32(x)>>8)*(1.0f/(1<<24))` on my Zen1 desktop.
There's as many subnormals as there are distinct values with the same exponent, so 2^23 for floats and 2^52 for doubles. This includes the number 0, but you may consider it a special case different from the other subnormals; if so, subtract one. How much of the range they occupy also depends on the interval you're requesting. However, for the common interval from 0 to 1, they occupy about 1/127th of the range for floats and 1/1023rd of the range for doubles.
One can reason to this conclusion by noting that the subnormals have the same effective exponent as the smallest normal number (except that they use the 0.xxx representation instead of 1.xxx; the exponent in question is -126 for floats and -1022 for doubles), and then observing that all normal numbers with exponents from that minimum up to and including -1 are also in the range. Since the number 1 is usually not included in the interval, we can ignore it, but even if it was, it would be the only representative from its exponent (0) anyway, so the effect on the size of the range would be miniscule.
There's another approach for doing this: generate a random number between 1 and 2 (they all have the same exponent) and then subtract 1 (that's the default implementation in Julia [0]).
But I think it also just gives you 2^53 different numbers.
So, between .5 and 1 you miss out on every second representable number, between 0.25 and .5 you miss out on 3/4 of them, and so on.
I guess for many cases that's good enough, but the article seems like a nice improvement.
ETA: Lemire has some thoughts on this [1] and links to what might be a prior solution [2]. Vigna (of xoroshiro fame) writes about it at the bottom of [3] and also links to [2]. So, presumably the implementation described in the article is faster? ("There have been some past attempts to fix these flaws, but none that avoid a huge performance penalty while doing so."
EDIT2: BTW, one of the things I love about HN (well, the world, really) is that there are people that care deeply that we can uniformly sample floats between 0 and 1 correctly, and all of them, and do it faster.
That is essentially just the fixed point method mentioned here, and it is uniformly distributed across the range (and faster because you do less), but it does not fill the low bits of the number correctly (53 bits of precision != 2^53 of them between 0 and 1). If you look at the final distribution of numbers and you use under 1-10 million numbers, that method and the division method will appear to be unbiased unless you have a pathological case.
Well, it has to be. The number can only have 53 bits of precision.
There are more than 2^53 floating point values between 0 and 1, but if you use them all, you'll fail to have a uniform distribution. Either certain regions of the space will be far more likely than other regions, or certain values will be far more likely than other values. The article takes the view that the second of those options is desirable, but it's not clear why.
And the algorithm the article describes is:
1. Generate a random float the uniform way.
2. If it's close to zero and has extra precision available, generate some more random bits to fill in the extra precision.
It's always going to be faster to skip step 2. What are we gaining by not skipping it?
Floats are typically used to approximate reals, a uniform distribution over individual float values is usually not what is needed. What is wanted is uniform distribution over equal-sized intervals.
2^53 equally-spaced values within [0, 1) is plenty enough for many use cases, but it's still fundamentally a set of fixed-point rather than floating-point numbers. For a true random floating-point value, all the available precision should be used such that every single value in the domain has some probability mass (except maybe subnormals, but you can usually forget about subnormals). Especially with 32-bit floats it's not that difficult to run out of precision when you start doing math and only have a fixed-point subset of size 2^23 available.
The floats are a surprisingly bad approximation of the reals. They're barely better than the integers, while their shortcomings are much harder to understand, so I'd say it's like if you want approximately a spaceship and you're choosing between an F-14 Tomcat and a Fiat Punto.
Neither of these things is a spaceship, but, it will be obvious to almost anybody why the Punto isn't a spaceship whereas you are less likely to know enough about the F-14 to see why it's not "good enough".
I think the truly surprising thing is just how well floating point numbers work in many practical applications despite how different they are from the real numbers. One could call it the "unreasonable effectiveness of floating point mathematics".
There are many situations where you have something you want to compute to within low number of units in the last place, that seem fairly involved, but there are very often clever methods that let you do it without having to go to extended or arbitrary precision. Maybe not that surprising, but it's something I find interesting.
x = 0.5 (with bit pattern 0x3f000000)
y = 0.25 (with bit pattern 0x3e800000)
then
x+y = 0.75 (with bit pattern 0x3f400000)
But if we just added the bit patterns like integers, we would get 0x7d800000 (with a float value of over 10^37). Just because they are discrete doesn't mean they are integers only that they can be mapped one-to-one with integers. The bit pattern is not the semantic meaning of the value, and you cannot perform correct operations if you ignore the semantics.
> It's always going to be faster to skip step 2. What are we gaining by not skipping it?
Having any number be possible is nice. Having the entire space be reachable when you're sampling a solution space and the solution may be smaller than 1/8. Or having small vector quantities not have their angles ridiculously constrained by quantization.
Most applications don't need it-- indeed, a double has a lot of precision compared to what most people care about, and throwing away some of it at the low end of the randomness range is no big deal. But sometimes, a double is barely (or not even) enough.
> Having the entire space be reachable when you're sampling a solution space and the solution may be smaller than 1/8.
You're out of luck there; no matter what happens you can only generate floating point numbers. If you're worried that the solution lies within the space of floating point numbers, but not within the part that's reachable by generating uniform samples with 1/2^53 resolution, you need to be much, much, much more worried that the solution isn't a floating point number at all.
By contrast, if you're unconcerned with the numeric properties of the numbers you're generating, and what you want is to sample "floating point values" to examine what kind of implementation-specific behavior they might have, you would never want this scheme - you'd want to generate random floating-point values, which you can do by just filling in random bits. Those will be concentrated around zero, which doesn't matter because you're not interpreting them as numbers.
> You're out of luck there; no matter what happens you can only generate floating point numbers. If you're worried that the solution lies within the space of floating point numbers, but not within the part that's reachable by generating uniform samples with 1/2^53 resolution, you need to be much, much, much more worried that the solution isn't a floating point number at all.
2^53 is 15 sigfigs. My answer might be around 10 pico-something and I might need a few digits of precision. Then I've thrown away 11 of my sigfigs and there may not be a numerically stable solution with what's left.
As you point out, double precision often isn't enough. But it's much more likely to not be enough if we throw large parts of the precision away.
> certain values will be far more likely than other values
That's fine, as for the normal understanding of uniformity on the interval, you want the probability that a value is chosen to be proportional to the "space" between its left and right neighbor, and that space is not constant on the interval.
What you try to emulate is picking a real number uniformly between 0 and 1, then returning the closest floating point number (except 1, maybe).
Indeed, there's a simple algorithm using interval arithmetic to do so (within bounded time too!).
Think of it as binary subdivision/search of [0,1], using a stream of random bits. Steps:
(1) Divide the current interval in 2 (using its midpoint);
(2) Using one random bit, pick one of the 2 intervals;
(3) if the picked interval (new current interval) lies entirely on the domain of a single floating point value[def1], stop and return this value, else go to (1).
[def1] The domain associated to a floating point value is the interval between the midpoint between it and its lower neighbor on the left, and the midpoint between it and its higher neighbor on the right.
I expect the performance is very poor, but it does cover all floating point numbers in [0,1] with exactly correct probabilities (assuming the bit probabilities are exactly correct). That's in part because naively you need higher precision arithmetic to do so, as well as up to 53 or so iterations, on average well over 32 I presume.
(I've left out the proof that probabilities are correct, but I believe it's easy to show)
Ah, sorry I didn't see the "or certain values will be far more likely than other values".
But I don't understand why that wouldn't fall under the definition of a uniform distribution.
The way I would define a uniform distribution is the following:
For any two floating-point numbers, r1 and r2, which form the range [r1,r2] over the real numbers, and any second pair of floating point numbers s1 and s2, which form a range [s1,s2] over the real numbers, which is contained in [r1,r2].
The probability of getting a result in [s1,s2] when sampling from [r1,r2] must be equivalent to the result of (s2-s1)/(r2-r1) with infinite precision.
> Second, the least significant bits that come from this generator are biased.
I don't think this is true; I'm guessing that the author borrowed this observation from one of the various other articles linked in this thread, that address the situation where we draw a 64-bit random unsigned integer and divide by `1<<64`, instead of drawing 53 bits and dividing by `1<<53`. The former does introduce a bias, but the latter doesn't (but does still limit the fraction of representable values).
I'm not sure which PRNG is used here, but some PRNGs have regularities in the lower bits.
This is e.g. the case for classical LCGs and the xoroshiro*+ family of PRNGs:
> If, however, one has to generate only 64-bit floating-point numbers (by extracting the upper 53 bits) xoshiro256+ is a slightly (≈15%) faster generator with analogous statistical properties. For general usage, one has to consider that its lowest bits have low linear complexity and will fail linearity tests (https://prng.di.unimi.it/)
> When m = 2k there is a further problem. The period of the bth bit of an LCG (where bits are numbered from the right, starting at one) is 2b , thus although the period is 2k, only the high bits are good and the lower order bits exhibit a clear repeating pattern. (https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf)
> For unbiased random floating-point numbers, generate floating-point numbers with probabilities given by drawing a real number and then rounding to floating point.
Immediately there are alarms going off in my mind. Your machine definitely can't pick real numbers, Almost All of them are non-computable. So you definitely can't be doing that, which means you should write down what you've decided to actually do because it's not that.
What you actually want isn't the reals at all, you want a binary fraction (since all your f64s are in fact binary fractions) between 0 and 1, and so you just need to take random bits until you find a one bit (the top bit of your fraction), counting all the zeroes to decide the exponent, then you need 52 more bits for your mantissa.
I'm sure there's a faster way to get the same results, but unlike the article's imagined "drawing a real number" this is actually something we can realize, since it doesn't involve non-computable numbers.
Edited: You need 52 more bits, earlier this comment said 51 but the one bit you've already seen is implied in the floating point type, so we need 52 more, any or all of which might be zero.
The part you're quoting is the spec, not the implementation. It implicitly assumes a machine model where sampling a uniformly random real number in an interval and correctly rounding it to a floating-point number are both supported operations. Of course this cannot be a real machine or even a Turing machine, but you can imagine and mathematically model a more powerful machine which can do these things. (Turing machines are imaginary anyways.)
This unimplementable algorithm for an imaginary machine is still useful, because it nails down the probability distribution we expect the output samples to have. And because they're sampled from a finite value space and the probabilities of obtaining each possible output are rational numbers, we have a chance of finding an implementable algorithm for a real machine that produces the same output distribution.
The rest of the article goes into detail on what they decided to actually do. It's similar to your proposal but has more edge-case handling and deals with different rounding modes.
Sure, but there is a simple correspondence between modern/typical computers (that does require a potentially unlimited supply of new hard drives, but that is rarely relevant) and Turing machines, while such a correspondence can probably never exist for the "real number" model.
If you can generate a sequence of independent fair coin flips of indefinite length (this is a big if, but hold that though), then you can effectively generate a true real number uniformly from the unit interval. What you can't do is to access all of its digits all at once. But to correctly round it to a value representable in a given floating-point representation you don't need all the digits, you only need enough digits to decide whether the number falls into an interval where all reals round to the same value.
As for generating independent fair coin flips of indefinite length, none of the PRNGs qualify.
Generalizing this to arbitrary intervals, not just [0, 1) looks tricky. Just scaling and shifting a perfect uniform result from [0, 1) doesn't result in a perfect random floating-point number.
Indeed. It might even result in out of range values due to implicit rounding (`3.5 + (4.5 - 3.5) * (the largest float smaller than 1)` results in 4.5, which is unexpected).
I'm a contributor to the PHP programming language and a maintainer of the randomness functionality and did some research into this topic to provide a floating point generation method that is as good as possible and came across the "Drawing Random Floating-point Numbers from an Interval" paper by Frédéric Goualard.
PHP 8.3 ships with a method generating random floats from arbitrary intervals based on the γ-section algorithm described in that paper. The PHP documentation goes into additional detail:
As part of the implementation I also reached out to Prof. Goualard with some questions for clarification regarding the behavior for extreme values, which resulted in an Corrigendum being published, since the algorithm was buggy in those cases (https://frederic.goualard.net/publications/corrigendum.pdf). The entire exchange with Prof. Goualard was really pleasant.
> Could this be upstreamed into the language's API?
If a language is in use, and if people have written code that generates pseudorandom sequences based on an initial seed, then no -- bad idea. Some programs rely on the same pseudorandom sequence for a given seed, and may fail otherwise.
That really depends on whether the language's standard library API specifies that implementations will use a particular random number generator, or merely specifies that the RNG will have certain properties. If you're going to depend on reproducible, portable RNG behavior you need to get those random numbers from an API that guarantees a particular algorithm.
Yeah I've messed with this and always end up wishing the simple ways were actually good enough. The bit where floats aren't really evenly spaced just gets annoying. Still, I kinda get the itch to cover all cases. Respect.
Why not just read 64 bits off /dev/urandom and be done with it? All this additional complexity doesn't actually buy any "extra" randomness over this approach, and I'm skeptical that it improves speed either.
The problem is, there's around 2^62 double precision numbers between 0 and 1, but they're not uniformly spaced-- there's many, many more between 0 and 0.5 (nearly 2^62) than there are between 0.5 and 1 (around 2^52), for instance.
So, if you want a uniform variate, but you want every number in the range to be possible to generate, it's tricky. Each individual small number needs to be much less likely than each individual large number.
If you just draw from the 2^62 space randomly, you almost certainly get a very small number.
Seems to me that the simplest solution would be to repeatedly divide the range of numbers into two halves, then randomly selecting either one until it converges onto a single value. In C this might look something like this:
double random_real(double low, double high, int (*random_bit)(void)) {
if (high < low)
return random_real(high, low, random_bit);
double halfway, previous = low;
while (true) {
halfway = low + (high - low) / 2;
if (halfway == previous)
break;
if (random_bit() & 1)
low = halfway;
else
high = halfway;
previous = halfway;
}
return halfway;
}
That should theoretically produce a uniformally-distributed value. (Although perhaps I've missed some finer point?)
So you have two doubles halfway and previous and a recursion that depends on if(halfway==previous) to terminate, where halfway is the result of a floating point calculation. You sure that’s going to work? I suspect it will frequently fail to terminate when you think.
Secondly, why does this generate a uniform random number? It’s not clear to me at all. It seems it would suffer the exact problem GP’s talking about here, that certain ranges of numbers would have a much higher probability than others on a weighted basis.
> Secondly, why does this generate a uniform random number?
Each interval of equal size occurs with equal likelihood at each step.
Consider that you want to generate a random number between 0 and 1024 (excl.). The midpoint would be 512, thus you choose randomly whether the lower interval [0, 512) or the higher interval [512,1024) is selected. In each step, the range size is independent of the concrete numbers, i.e. for it is exactly 2^(-step_size) * (high - low), and in each step each range has equal probability. Thus if the algorithm terminates, the selected number was in fact uniformly sampled.
I would also presume it must terminate. Assume that the two endpoints are one ulp apart. The midpoint is thus either of the two, there is no randomness involved (barring FPU flags but they don't count). In the next step, the algorithm either terminates or sets the endpoints equal, which also fixes the midpoint. Thus the procedure always returns the desired result.
The issues that the GP is grappling with are largely due to the fact that they are trying to "construct" real numbers from a stream of bits. That is always going to lead to bias issues. On the other hand, with this particular algorithm (assuming a truly random source) the resulting number should be more or less completely uniform. It works because we are partitioning the search space itself in such a way that all numbers are as likely as any other. In fact, that the algorithm terminates rather predictably essentially proves just that. After one million invocations, for example, the average number of iterations was something like 57 (with the minimum being 55 and the maximum outlier 74). Which is to say you could pick any number whatsoever and expect to see it no more than once per ~2^57 invocations.
I was curious about this. On the one hand, comparing doubles with == is rarely a good idea but, on the other hand, your explanation seems valid.
After some testing I discovered a problem but not with the comparison. The problem is with calculating the halfway value. There are some doubles where their difference cannot be represented as a double:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <float.h>
double random_real(double low, double high, int (*random_bit)(void)) {
if (high < low)
return random_real(high, low, random_bit);
double halfway, previous = low;
while (1) {
halfway = low + (high - low) / 2;
if (halfway == previous)
break;
if (random_bit() & 1)
low = halfway;
else
high = halfway;
previous = halfway;
}
return halfway;
}
int main(int argc, char *argv[]) {
srand(time(NULL));
for (int i = 0; i < 1000000; i++) {
double r = random_real(-DBL_MAX, DBL_MAX, rand);
printf("%f\n", r);
}
}
Actually, the problem is that the algorithm is having to calculate DBL_MAX+DBL_MAX, which of course is going to exceed the maximum value for a double-precision number (by definition). That isn't a very realistic use-case either, but in any case you could just clamp the inputs like so:
double random_real(double low, double high, int (*random_bit)(void)) {
if (high < low)
return random_real(high, low, random_bit);
const double max = DBL_MAX / 2;
if (high > max)
high = max;
const double min = -max;
if (low < min)
low = min;
double halfway, previous = low;
while (1) {
halfway = low + (high - low) / 2;
if (halfway == previous)
break;
if (random_bit() & 1)
low = halfway;
else
high = halfway;
previous = halfway;
}
return halfway;
}
I had hoped that, somewhere in the article, the author would say, "In this article, the term 'random' is shorthand for 'pseudorandom'." But no.
Programming students might read the article and come away with the idea that a deterministic algorithm can generate random numbers.
This is like the sometimes-heard claim that "Our new error-detecting algorithm discovers whether a submitted program contains errors that might make it stop." Same problem -- wrong as written, but no qualifiers.
But this article converts another source of randomness, it doesn't care where the random bits came from.
It will be just as happy with dice rolling, the PRNG from Commodore 64 BASIC, the built-in random numbers from a modern Intel CPU or "random" bits harvested by sampling keyboard input, it just takes the "random" bits and makes floating point values.
So there's no reason to say it must be pseudo-random. In most practical applications it would be, but that doesn't seem essential AFAICT
(Sry, none native English speaker, in German:) "Moment mal, sagst Du da gerade, mit aller höchster Wahrscheinlichkeit gibt es gar keine zufälligen Zahlen, die ein Computer generieren könne, da 'Bits' immer nur entweder den Wert '0' oder eben '1' annehmen können (und das auf inzwischen mehreren Layern, andere Geschichte...)?"
let me try to explain it a little...
3 * (A = >0) adding...
1 * (A = 1) (..another topic yes...)
...but that what was wanted is "NONE" (of it...) or in other words 'maybe random'??
I've written an algorithm that generates a uniform random float in the range [a,b] that can produce every representable floating-point value with a probability proportional to the covered real number range*: https://github.com/camel-cdr/cauldron/blob/main/cauldron/ran...
* Well, almost. I messed up the probability around subnormals slightly and haven't gotten around to fixing it yet.
Still, the algorithm it self should works for all other values and it was measurably more accurate than the regular algorithm.
In terms of performance it is 10x slower compared to the regular implementation of `(randu32(x)>>8)*(1.0f/(1<<24))` on my Zen1 desktop.
How relevant are subnormals in this context?
i.e. what percentage of the range do they occupy and how often would one expect to need to sample to obtain one?
There's as many subnormals as there are distinct values with the same exponent, so 2^23 for floats and 2^52 for doubles. This includes the number 0, but you may consider it a special case different from the other subnormals; if so, subtract one. How much of the range they occupy also depends on the interval you're requesting. However, for the common interval from 0 to 1, they occupy about 1/127th of the range for floats and 1/1023rd of the range for doubles.
One can reason to this conclusion by noting that the subnormals have the same effective exponent as the smallest normal number (except that they use the 0.xxx representation instead of 1.xxx; the exponent in question is -126 for floats and -1022 for doubles), and then observing that all normal numbers with exponents from that minimum up to and including -1 are also in the range. Since the number 1 is usually not included in the interval, we can ignore it, but even if it was, it would be the only representative from its exponent (0) anyway, so the effect on the size of the range would be miniscule.
You will never ever ever obtain one.
Even calculating your exponent with a single count_trailing_zeros on a single 64 bit value will give you results indistinguishable from perfect.
There's another approach for doing this: generate a random number between 1 and 2 (they all have the same exponent) and then subtract 1 (that's the default implementation in Julia [0]). But I think it also just gives you 2^53 different numbers.
So, between .5 and 1 you miss out on every second representable number, between 0.25 and .5 you miss out on 3/4 of them, and so on.
I guess for many cases that's good enough, but the article seems like a nice improvement.
ETA: Lemire has some thoughts on this [1] and links to what might be a prior solution [2]. Vigna (of xoroshiro fame) writes about it at the bottom of [3] and also links to [2]. So, presumably the implementation described in the article is faster? ("There have been some past attempts to fix these flaws, but none that avoid a huge performance penalty while doing so."
EDIT2: BTW, one of the things I love about HN (well, the world, really) is that there are people that care deeply that we can uniformly sample floats between 0 and 1 correctly, and all of them, and do it faster.
[0] see https://github.com/JuliaLang/julia/blob/master/stdlib/Random...
[1] https://lemire.me/blog/2017/02/28/how-many-floating-point-nu...[2] https://mumble.net/~campbell/2014/04/28/uniform-random-float
https://mumble.net/~campbell/2014/04/28/random_real.c
[3] https://prng.di.unimi.it
That is essentially just the fixed point method mentioned here, and it is uniformly distributed across the range (and faster because you do less), but it does not fill the low bits of the number correctly (53 bits of precision != 2^53 of them between 0 and 1). If you look at the final distribution of numbers and you use under 1-10 million numbers, that method and the division method will appear to be unbiased unless you have a pathological case.
That looks kinda the same end result as the fixed point method, but possibly cheaper to execute.
Well, it has to be. The number can only have 53 bits of precision.
There are more than 2^53 floating point values between 0 and 1, but if you use them all, you'll fail to have a uniform distribution. Either certain regions of the space will be far more likely than other regions, or certain values will be far more likely than other values. The article takes the view that the second of those options is desirable, but it's not clear why.
And the algorithm the article describes is:
1. Generate a random float the uniform way.
2. If it's close to zero and has extra precision available, generate some more random bits to fill in the extra precision.
It's always going to be faster to skip step 2. What are we gaining by not skipping it?
Floats are typically used to approximate reals, a uniform distribution over individual float values is usually not what is needed. What is wanted is uniform distribution over equal-sized intervals.
2^53 equally-spaced values within [0, 1) is plenty enough for many use cases, but it's still fundamentally a set of fixed-point rather than floating-point numbers. For a true random floating-point value, all the available precision should be used such that every single value in the domain has some probability mass (except maybe subnormals, but you can usually forget about subnormals). Especially with 32-bit floats it's not that difficult to run out of precision when you start doing math and only have a fixed-point subset of size 2^23 available.
The floats are a surprisingly bad approximation of the reals. They're barely better than the integers, while their shortcomings are much harder to understand, so I'd say it's like if you want approximately a spaceship and you're choosing between an F-14 Tomcat and a Fiat Punto.
Neither of these things is a spaceship, but, it will be obvious to almost anybody why the Punto isn't a spaceship whereas you are less likely to know enough about the F-14 to see why it's not "good enough".
I think the truly surprising thing is just how well floating point numbers work in many practical applications despite how different they are from the real numbers. One could call it the "unreasonable effectiveness of floating point mathematics".
Why is that surprising? No one ever uses real numbers for anything.
There are many situations where you have something you want to compute to within low number of units in the last place, that seem fairly involved, but there are very often clever methods that let you do it without having to go to extended or arbitrary precision. Maybe not that surprising, but it's something I find interesting.
> The floats are a surprisingly bad approximation of the reals. They're barely better than the integers
They are integers. Each float is a pattern of 64 discrete bits. Discretion means integers.
In float32 land, if
then But if we just added the bit patterns like integers, we would get 0x7d800000 (with a float value of over 10^37). Just because they are discrete doesn't mean they are integers only that they can be mapped one-to-one with integers. The bit pattern is not the semantic meaning of the value, and you cannot perform correct operations if you ignore the semantics.> It's always going to be faster to skip step 2. What are we gaining by not skipping it?
Having any number be possible is nice. Having the entire space be reachable when you're sampling a solution space and the solution may be smaller than 1/8. Or having small vector quantities not have their angles ridiculously constrained by quantization.
Most applications don't need it-- indeed, a double has a lot of precision compared to what most people care about, and throwing away some of it at the low end of the randomness range is no big deal. But sometimes, a double is barely (or not even) enough.
> Having the entire space be reachable when you're sampling a solution space and the solution may be smaller than 1/8.
You're out of luck there; no matter what happens you can only generate floating point numbers. If you're worried that the solution lies within the space of floating point numbers, but not within the part that's reachable by generating uniform samples with 1/2^53 resolution, you need to be much, much, much more worried that the solution isn't a floating point number at all.
By contrast, if you're unconcerned with the numeric properties of the numbers you're generating, and what you want is to sample "floating point values" to examine what kind of implementation-specific behavior they might have, you would never want this scheme - you'd want to generate random floating-point values, which you can do by just filling in random bits. Those will be concentrated around zero, which doesn't matter because you're not interpreting them as numbers.
> You're out of luck there; no matter what happens you can only generate floating point numbers. If you're worried that the solution lies within the space of floating point numbers, but not within the part that's reachable by generating uniform samples with 1/2^53 resolution, you need to be much, much, much more worried that the solution isn't a floating point number at all.
2^53 is 15 sigfigs. My answer might be around 10 pico-something and I might need a few digits of precision. Then I've thrown away 11 of my sigfigs and there may not be a numerically stable solution with what's left.
As you point out, double precision often isn't enough. But it's much more likely to not be enough if we throw large parts of the precision away.
> certain values will be far more likely than other values
That's fine, as for the normal understanding of uniformity on the interval, you want the probability that a value is chosen to be proportional to the "space" between its left and right neighbor, and that space is not constant on the interval.
What you try to emulate is picking a real number uniformly between 0 and 1, then returning the closest floating point number (except 1, maybe).
> but if you use them all, you'll fail to have a uniform distribution
Only if you generate them all with equal probability.
Indeed, there's a simple algorithm using interval arithmetic to do so (within bounded time too!).
Think of it as binary subdivision/search of [0,1], using a stream of random bits. Steps:
(1) Divide the current interval in 2 (using its midpoint); (2) Using one random bit, pick one of the 2 intervals; (3) if the picked interval (new current interval) lies entirely on the domain of a single floating point value[def1], stop and return this value, else go to (1).
[def1] The domain associated to a floating point value is the interval between the midpoint between it and its lower neighbor on the left, and the midpoint between it and its higher neighbor on the right.
I expect the performance is very poor, but it does cover all floating point numbers in [0,1] with exactly correct probabilities (assuming the bit probabilities are exactly correct). That's in part because naively you need higher precision arithmetic to do so, as well as up to 53 or so iterations, on average well over 32 I presume.
(I've left out the proof that probabilities are correct, but I believe it's easy to show)
You can start with this idea, and then make it _very_ performant by using bit counting instructions. See https://www.corsix.org/content/higher-quality-random-floats for an exposition.
Did you read the rest of my comment? You can't have a uniform distribution no matter what.
Ah, sorry I didn't see the "or certain values will be far more likely than other values". But I don't understand why that wouldn't fall under the definition of a uniform distribution.
The way I would define a uniform distribution is the following:
For any two floating-point numbers, r1 and r2, which form the range [r1,r2] over the real numbers, and any second pair of floating point numbers s1 and s2, which form a range [s1,s2] over the real numbers, which is contained in [r1,r2]. The probability of getting a result in [s1,s2] when sampling from [r1,r2] must be equivalent to the result of (s2-s1)/(r2-r1) with infinite precision.
This is obviously possible to achieve.
Given a uniform distribution over an interval (α, β) and a tolerance ε, sample the distribution twice. What is the probability that |x_1 - x_2| < ε?
At some point in my career I worked on floating point hardware design, and I agree with your approach.
Some good references on this which IMO should have been mentioned in the article:
https://pharr.org/matt/blog/2022/03/05/sampling-fp-unit-inte...
https://marc-b-reynolds.github.io/distribution/2017/01/17/De...
> Second, the least significant bits that come from this generator are biased.
I don't think this is true; I'm guessing that the author borrowed this observation from one of the various other articles linked in this thread, that address the situation where we draw a 64-bit random unsigned integer and divide by `1<<64`, instead of drawing 53 bits and dividing by `1<<53`. The former does introduce a bias, but the latter doesn't (but does still limit the fraction of representable values).
I'm not sure which PRNG is used here, but some PRNGs have regularities in the lower bits.
This is e.g. the case for classical LCGs and the xoroshiro*+ family of PRNGs:
> If, however, one has to generate only 64-bit floating-point numbers (by extracting the upper 53 bits) xoshiro256+ is a slightly (≈15%) faster generator with analogous statistical properties. For general usage, one has to consider that its lowest bits have low linear complexity and will fail linearity tests (https://prng.di.unimi.it/)
> When m = 2k there is a further problem. The period of the bth bit of an LCG (where bits are numbered from the right, starting at one) is 2b , thus although the period is 2k, only the high bits are good and the lower order bits exhibit a clear repeating pattern. (https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf)
> For unbiased random floating-point numbers, generate floating-point numbers with probabilities given by drawing a real number and then rounding to floating point.
Immediately there are alarms going off in my mind. Your machine definitely can't pick real numbers, Almost All of them are non-computable. So you definitely can't be doing that, which means you should write down what you've decided to actually do because it's not that.
What you actually want isn't the reals at all, you want a binary fraction (since all your f64s are in fact binary fractions) between 0 and 1, and so you just need to take random bits until you find a one bit (the top bit of your fraction), counting all the zeroes to decide the exponent, then you need 52 more bits for your mantissa.
I'm sure there's a faster way to get the same results, but unlike the article's imagined "drawing a real number" this is actually something we can realize, since it doesn't involve non-computable numbers.
Edited: You need 52 more bits, earlier this comment said 51 but the one bit you've already seen is implied in the floating point type, so we need 52 more, any or all of which might be zero.
The part you're quoting is the spec, not the implementation. It implicitly assumes a machine model where sampling a uniformly random real number in an interval and correctly rounding it to a floating-point number are both supported operations. Of course this cannot be a real machine or even a Turing machine, but you can imagine and mathematically model a more powerful machine which can do these things. (Turing machines are imaginary anyways.)
This unimplementable algorithm for an imaginary machine is still useful, because it nails down the probability distribution we expect the output samples to have. And because they're sampled from a finite value space and the probabilities of obtaining each possible output are rational numbers, we have a chance of finding an implementable algorithm for a real machine that produces the same output distribution.
The rest of the article goes into detail on what they decided to actually do. It's similar to your proposal but has more edge-case handling and deals with different rounding modes.
> Turing machines are imaginary anyways.
Sure, but there is a simple correspondence between modern/typical computers (that does require a potentially unlimited supply of new hard drives, but that is rarely relevant) and Turing machines, while such a correspondence can probably never exist for the "real number" model.
If you can generate a sequence of independent fair coin flips of indefinite length (this is a big if, but hold that though), then you can effectively generate a true real number uniformly from the unit interval. What you can't do is to access all of its digits all at once. But to correctly round it to a value representable in a given floating-point representation you don't need all the digits, you only need enough digits to decide whether the number falls into an interval where all reals round to the same value.
As for generating independent fair coin flips of indefinite length, none of the PRNGs qualify.
I don't think you can make the non-computable reals with your "indefinite fair coin flips". All you're making there is an approximation.
You can make all the rationals this way, but that's Almost None of the real numbers.
Generalizing this to arbitrary intervals, not just [0, 1) looks tricky. Just scaling and shifting a perfect uniform result from [0, 1) doesn't result in a perfect random floating-point number.
Indeed. It might even result in out of range values due to implicit rounding (`3.5 + (4.5 - 3.5) * (the largest float smaller than 1)` results in 4.5, which is unexpected).
I'm a contributor to the PHP programming language and a maintainer of the randomness functionality and did some research into this topic to provide a floating point generation method that is as good as possible and came across the "Drawing Random Floating-point Numbers from an Interval" paper by Frédéric Goualard.
PHP 8.3 ships with a method generating random floats from arbitrary intervals based on the γ-section algorithm described in that paper. The PHP documentation goes into additional detail:
https://www.php.net/manual/en/random-randomizer.getfloat.php
As part of the implementation I also reached out to Prof. Goualard with some questions for clarification regarding the behavior for extreme values, which resulted in an Corrigendum being published, since the algorithm was buggy in those cases (https://frederic.goualard.net/publications/corrigendum.pdf). The entire exchange with Prof. Goualard was really pleasant.
Cool observation! Despite knowing both about how floating points work and how random number generation works, this never occurred to me.
I do wish the solution were a bit simpler though. Could this be upstreamed into the language's API? https://cs.opensource.google/go/go/+/refs/tags/go1.24.3:src/...
> Could this be upstreamed into the language's API?
If a language is in use, and if people have written code that generates pseudorandom sequences based on an initial seed, then no -- bad idea. Some programs rely on the same pseudorandom sequence for a given seed, and may fail otherwise.
That really depends on whether the language's standard library API specifies that implementations will use a particular random number generator, or merely specifies that the RNG will have certain properties. If you're going to depend on reproducible, portable RNG behavior you need to get those random numbers from an API that guarantees a particular algorithm.
I thought this was going to go further down the path that the Die Hard Battery of random number tests started:
https://www.jstatsoft.org/index.php/jss/article/view/v007i03...
(Dieharder is probably a better suite to use at this point, but that paper is approachable.)
Better test suites are PractRand and TestU01.
So, fun fact:
Between 0 and 1, you have about a quarter of all floating point numbers (and then a quarter > 1, a quarter < -1, and a quarter between -1 and 0).
Between 1 and 2, you have about 0.024% of all floating point numbers (for double precision, a factor of around 1023).
Yeah I've messed with this and always end up wishing the simple ways were actually good enough. The bit where floats aren't really evenly spaced just gets annoying. Still, I kinda get the itch to cover all cases. Respect.
Probably relevant: https://prng.di.unimi.it/
I use the PRNGs here for my own needs and they work great... at least as far as I can tell. :-)
There's the xoroshiro family, and there's the PCG family:
https://www.pcg-random.org
I haven't seen these before. Thanks for the reference.
Why not just read 64 bits off /dev/urandom and be done with it? All this additional complexity doesn't actually buy any "extra" randomness over this approach, and I'm skeptical that it improves speed either.
The problem is, there's around 2^62 double precision numbers between 0 and 1, but they're not uniformly spaced-- there's many, many more between 0 and 0.5 (nearly 2^62) than there are between 0.5 and 1 (around 2^52), for instance.
So, if you want a uniform variate, but you want every number in the range to be possible to generate, it's tricky. Each individual small number needs to be much less likely than each individual large number.
If you just draw from the 2^62 space randomly, you almost certainly get a very small number.
Seems to me that the simplest solution would be to repeatedly divide the range of numbers into two halves, then randomly selecting either one until it converges onto a single value. In C this might look something like this:
That should theoretically produce a uniformally-distributed value. (Although perhaps I've missed some finer point?)So you have two doubles halfway and previous and a recursion that depends on if(halfway==previous) to terminate, where halfway is the result of a floating point calculation. You sure that’s going to work? I suspect it will frequently fail to terminate when you think.
Secondly, why does this generate a uniform random number? It’s not clear to me at all. It seems it would suffer the exact problem GP’s talking about here, that certain ranges of numbers would have a much higher probability than others on a weighted basis.
> Secondly, why does this generate a uniform random number?
Each interval of equal size occurs with equal likelihood at each step.
Consider that you want to generate a random number between 0 and 1024 (excl.). The midpoint would be 512, thus you choose randomly whether the lower interval [0, 512) or the higher interval [512,1024) is selected. In each step, the range size is independent of the concrete numbers, i.e. for it is exactly 2^(-step_size) * (high - low), and in each step each range has equal probability. Thus if the algorithm terminates, the selected number was in fact uniformly sampled.
I would also presume it must terminate. Assume that the two endpoints are one ulp apart. The midpoint is thus either of the two, there is no randomness involved (barring FPU flags but they don't count). In the next step, the algorithm either terminates or sets the endpoints equal, which also fixes the midpoint. Thus the procedure always returns the desired result.
The issues that the GP is grappling with are largely due to the fact that they are trying to "construct" real numbers from a stream of bits. That is always going to lead to bias issues. On the other hand, with this particular algorithm (assuming a truly random source) the resulting number should be more or less completely uniform. It works because we are partitioning the search space itself in such a way that all numbers are as likely as any other. In fact, that the algorithm terminates rather predictably essentially proves just that. After one million invocations, for example, the average number of iterations was something like 57 (with the minimum being 55 and the maximum outlier 74). Which is to say you could pick any number whatsoever and expect to see it no more than once per ~2^57 invocations.
I was curious about this. On the one hand, comparing doubles with == is rarely a good idea but, on the other hand, your explanation seems valid.
After some testing I discovered a problem but not with the comparison. The problem is with calculating the halfway value. There are some doubles where their difference cannot be represented as a double:
Actually, the problem is that the algorithm is having to calculate DBL_MAX+DBL_MAX, which of course is going to exceed the maximum value for a double-precision number (by definition). That isn't a very realistic use-case either, but in any case you could just clamp the inputs like so:
Fixed it!
Yep, that works too. You likely won't ever need a random number that large, of course, but if you did that would be the way to do it.
That wouldn't produce uniformly distributed random floating-point numbers.
"Perfect Random [sic] Floating-Point Numbers" ???
I had hoped that, somewhere in the article, the author would say, "In this article, the term 'random' is shorthand for 'pseudorandom'." But no.
Programming students might read the article and come away with the idea that a deterministic algorithm can generate random numbers.
This is like the sometimes-heard claim that "Our new error-detecting algorithm discovers whether a submitted program contains errors that might make it stop." Same problem -- wrong as written, but no qualifiers.
But this article converts another source of randomness, it doesn't care where the random bits came from.
It will be just as happy with dice rolling, the PRNG from Commodore 64 BASIC, the built-in random numbers from a modern Intel CPU or "random" bits harvested by sampling keyboard input, it just takes the "random" bits and makes floating point values.
So there's no reason to say it must be pseudo-random. In most practical applications it would be, but that doesn't seem essential AFAICT
Something interesting happend, while reading...
(Sry, none native English speaker, in German:) "Moment mal, sagst Du da gerade, mit aller höchster Wahrscheinlichkeit gibt es gar keine zufälligen Zahlen, die ein Computer generieren könne, da 'Bits' immer nur entweder den Wert '0' oder eben '1' annehmen können (und das auf inzwischen mehreren Layern, andere Geschichte...)?"
let me try to explain it a little...
3 * (A = >0) adding... 1 * (A = 1) (..another topic yes...)
...but that what was wanted is "NONE" (of it...) or in other words 'maybe random'??
??
:confused: