[ThinAir Home]
[Table of Contents]
[Quick Start]
[Choosing a Random Number Generator]

# 3 Characteristics of Random Number Sequences

What do you look for in a random number sequence? The easiest answer to this
question is *unpredictability*. But how do you measure unpredictablility?
One way to determine the unpredictability of a number sequence is to look for
characteristics of a perfectly random sequence of numbers. What *are* the
characteristics of a good random sequence? Which characteristics of a random
sequence are important?
Once we know what characteristics are desirable in a random sequence, we can
evaluate PRNGs in terms of the sequences they generate. Just as important,
we can decide which characteristics do not affect our use of the PRNG. If a
particular characteristic has no effect on your problem, there is no need to
consider that characteristic when evaluating PRNGs.

Probably the most fundamental characteristic of a sequence of numbers
is its *distribution*. The PRNGs in the ThinAir library (with a few
exceptions) produce *uniformly distributed* number sequences. Other
distributions include *normal*, *Poisson*, *geometric*,
*binomial*, and *student-t*. Other authors have published algorithms
that convert uniformly distributed number sequences into other distributions.
(See [4, pages 131-133])

Another fundamental characteristic of a random number sequence is its
*range*. What is the smallest number in the sequence? What is the largest?
These two numbers define the range of the whole sequence. The range of a
sequence is more important and more subtle than most people realize. If the
range of a random number sequence is 0 to 64, it is useless for generating
random percentages. One good example of a bad range problem appears in some
simple card games. In these examples, a 16-bit random number generator is
used to choose a layout of a deck of cards. The problem lies in the fact that
the generator can produce at most 65,536 different numbers. A deck of cards
can produce 80.66 * 10^{66} different layouts. If you are very careful in
your use of the generator, this problem is solvable. If not, most of the
possible layouts never occur.

Another useful characteristic of a random sequence is its *type*. What
kind of numbers make up this sequence. Are we talking about bits, integers,
floating point numbers, strings, or something else entirely. Most of these
types can convert into each other relatively easily. In general, PRNGs
fall into three categories:

- PRNGs that generate a sequence of integers.
- PRNGs that generate a sequence of floating point numbers between 0.0
and 1.0.
- PRNGs that generate a stream of bits.

All algorithm-based PRNGs repeat if they are allowed to run for long enough.
This is caused by the fact that the generator has a finite amount of internal
*state*, and by the fact that an algorithm cannot produce truly random
output. Since we know the sequence must repeat, we would obviously like to have
it repeat after as long a time as possible. The number of individual numbers
generated by the PRNG before it begins to repeat is known as the *period*
of the sequence. This sequence of numbers is often called a *cycle*. The
amount of internal state retained by a PRNG defines the upper limit on the
period. A PRNG with a period equal to the maximum predicted by its internal
state is sometimes called a *maximal-period generator*.
Depending on the algorithm used, a PRNG may actually generate more than one
cycle. The more individual cycles a generator has, the shorter the period each
cycle has. The best generator would, of course, have a single cycle that was
longer than we could ever use (effectively infinite). Some multiplicative
congruential random number generators have two cycles, one being very long
(approximately 2^{32}-1 numbers in the sequence), the other being a cycle of
period 1 at 0.

The term *coverage* describes how well a given sequence covers its output
range. For example, a PRNG with an output range of 0 to 2^{32}-1 should
generate every number between 0 and 2^{32}-1 if allowed to run for long
enough. Not all PRNGs cover their entire output range.
If a PRNG has multiple cycles, the coverage issue is even more interesting.
The generator may actually be capable of covering the whole range of output
values. However, only certain values are covered in each cycle. The
multiplicative congruential PRNGs mentioned above have a maximum coverage one
less than expected in the main cycle. This is because an output of zero is never
part of the main cycle.

Many tests have been devised to study coverage. Two of these are the
*equidistribution*, or *frequency*, test and the *gap* test.
The equidistribution test tries to verify that all of the numbers in the
sequence are equally likely. The gap test studies the probabilities of gaps of
different sizes between numbers in a particular range.

One interesting characteristic of a sequence of numbers concerns *runs*.
When each successive number in a subsequence is larger than the previous number,
we have a *run up*. When each successive number in a subsequence is smaller
than the previous number, we have a *run down*. In a good random sequence
we would expect the number of runs up to be pretty close to the number of runs
down. We would also expect that longer runs would be less likely than shorter
runs.

A good sequence of random numbers should not have any *correlation* between
successive numbers. This means that you should not be able to predict the next
number in the sequence by looking at previous numbers. The classic example of
serial correlation in a PRNG is the linear congruential generator. When
a linear congruential PRNG is used to generate pairs of numbers for use as
x, y coordinates on a graph, a pattern often shows up in the graph. The pairs
of numbers tend to fall on a set of parallel lines on the graph. This
*artifact* is a simple result of the algorithm used.
One way to test for high serial correlation is the *Poker* test. The Poker
test looks at the combinations of five successive numbers taken from the
sequence. These numbers are analyzed for pairs, three-of-a-kind, full house,
etc. The probabilities of these combinations should approach that of a random
number stream.

In a truly random number stream, any permutation of a set of numbers is as
likely as any other permutation of the same numbers. If one permutation of a
set of numbers is more likely than another, a few numbers in the set may supply
enough information to help predict following numbers.

Even if a sequence of numbers does not exhibit any obvious patterns, it may
have higher order artifacts. A truly random sequence would have no particular
frequency component more pronounced than any other. Since any PRNGs must repeat
after some length of time, there is at least one dominant frequency component.
Studying the spectral characteristics tells us if there are any others.

For further information, contact G. Wade Johnson
(gwadej@anomaly.org).

#####
© 1997 G. Wade Johnson.
All rights reserved.

http://www.anomaly.org/ThinAir/charactr.html