U([0, 1)) → U([0, 1])

Motivation

Given a sampler for a uniform distribution in [0,1), can one attain a sampler for a uniform distribuition in [0,1] (i.e. the value 1 is possible as well)?

First, a disclaimer: the question itself is ridiculous and a purely mathematical thought experiment. I also don’t have an solution here. But it was fun to think it through. It is ridiculous, because either you need it in practice on a computer. Then you need to consider the realities of IEEE 754 with its incapability to represent a continuous interval. Then you need completely different approaches than represented here. Or you consider the mathematically continuous range [0,1) in ℝ. In this case, measure theory tells you that the measures of [0,1) and [0,1] are equal and thus the question becomes boring in all mathematical contexts.

Ok, so we can sample values from a uniform distribution U([0, 1)) and need an algorithm with returns one value more with the same probability.

First idea

Let \(s ∈ U([0,1))\).

\[ u(s) := \frac{1}{1-s} \]

So with \(f(x) = 1-x\) I can map \(U([0,1))\) to \(U((0,1\))] (sorry, the markup language parser screws up the braces). Function \(f(x) = 1/x\) is fun, because it gets arbitrarily close to 0, but does not reach it. The argument 0 itself is disallowed in ℝ (c.f. partial function). We use this mechanism in the way that 1 is never returned by the sampler \(U([0,1))\). However, the idea fails because it completely distorts the uniform probability and value 0 cannot be attained.

Second idea

Let \(s ∈ U([0,1))\) and \(t ∈ U([0,1))\).

\[ u(s, t) := \frac{s+(1-t)}{2} \]

Now the idea is to sample twice. We use one value in [0,1) (namely \(s\)) and one value in (0,1] (namely \(1-t\)). Since we now apply addition, we don’t distort the probability. Division by 2 just maps the sum from interval (0, 2) to (0, 1). Oh right, we already know the reason why this result is invalid: values 0 and 1 are excluded.

Third idea

Now I started integrating the idea of rejection sampling. Draw a sample and test it. Check it, potentially reject it and do something else.

Let \(s ∈ U([0,1))\) and \(t ∈ U([0,1))\).

\[ u(s, t) := \begin{cases} s & \text{if } s < 0.5 \\ 1-t & \text{else} \end{cases} \]

Due to the case distinction, we either pick a value in [0, 0.5) or (0, 1]. With set union, this makes [0, 1] which is our desired interval. However, did we preserve the properties of a uniform distribution? It immediately looks suspicious if the intervals of the two cases overlap, because this distorts the probability. Value 0.3 might be returned due to case 1 or 2 whereas value 0.7 can only result from case 2. Hence, the values in (0, 0.5) are twice as likely as the values [0.5, 1].

Fourth idea

So let us align the intervals.

Let \(s ∈ U([0,1))\) and \(t ∈ U([0,1))\).

\[ u(s, t) := \begin{cases} s & \text{if } s < 0.5 \\ 1-\frac{t}{2} & \text{else} \end{cases} \]

Now the two cases cover the intervals [0, 0.5) and (0.5, 1]. So we miss value 0.5 … damn.

Fifth idea

Does it help if we make the condition independent of the returned value?

Let \(r, s, t ∈ U([0,1))\).

\[ u(r, s, t) := \begin{cases} s & \text{if } r < 0.5 \\ 1-t & \text{else} \end{cases} \]

The two cases cover the intervals [0, 1) and (0, 1]. Its set union is [0, 1] which covers our desired interval. However, value 0 can only be created if the first case applies whereas 1 can only be reach through the second case. All other values can be returned from either case. So values {0, 1} are less likely than other values.

Fifth idea

Ok, can we introduce a special handling for the one value we want to extend upon?

Let \(s, t ∈ U([0,1))\).

\[ u(s, t) := \begin{cases} \begin{cases} 0 & \text{if } t < 0.5 \\ 1 & \text{else} \end{cases} & \text{if } s = 0 \\ s & \text{else} \end{cases} \]

If we start to create two return values for one value, we obviously split the probability into two. Thus we don’t create a uniform distribution, because values {0, 1} are half as likely.

Sixth idea

Ok, let us get back to previous ideas. If we don’t use less-than but less-than-or-equal in the condition, we can get a closed interval. This way, we can design our desired interval.
Now, I need to use a loop to express the process:

forever
1. Let \(s, t ∈ U([0,1))\).
2. if \(s\) ≤ 0.5, then return \(s\)
3. if \(t\) ≤ 0.5, then return \(1-t\)

So we either return a value from the interval [0, 0.5] or from [0.5, 1]. Because we neglect the cases when the condition is not true, it is difficult for me to quantify the probability distribution in this case. I think the fact that value 0.5 occurs in both cases makes it more likely than other values. However, I think this approach is also bad, because we have no guarantee of termination. We cannot guarantee that we always get a sampled value.

Conclusion

I looked at different ideas to extend U([0,1)) to U([0,1]), but failed at finding a mathematically beautiful solution. I am neither a probability theorist nor did any literature research. I wrote down the question several years ago during my studies as a note and discussed it with work colleagues recently. The brainstorming lead to what I wrote into the disclaimer. Finally, I think combining a function and its inverse (functions \(10^x\) and \(\log_{10}(x)\)) might be a viable approach, but I lacked creativity to combine them usefully. Facts like \(1/x \neq 0\), \(10^x \neq 0\), \(\log_{10}(x) \neq 0\) and \(10^0 = 1\) can include/exclude the missing value. But all of it is just food for thought.