Pseudorandomness

A pseudorandom process is a process that appears to be random but is not. Pseudorandom sequences typically exhibit statistical randomness while being generated by an entirely deterministic causal process. Such a process is easier to produce than a genuinely random one, and has the benefit that it can be used again and again to produce exactly the same numbers - useful for testing and fixing software.

To generate truly random numbers requires precise, accurate, and repeatable system measurements of absolutely non-deterministic processes. Linux uses, for example, various system timings (like user keystrokes, I/O, or least-significant digit voltage measurements) to produce a pool of random numbers. It attempts to constantly replenish the pool, depending on the level of importance, and so will issue a random number. This system is an example, and similar to those of dedicated hardware random number generators.

History

The generation of random numbers has many uses (mostly in statistics, for random sampling, and simulation). Before modern computing, researchers requiring random numbers would either generate them through various means (dice, cards, roulette wheels, etc.) or use existing random number tables.

The first attempt to provide researchers with a ready supply of random digits was in 1927, when the Cambridge University Press published a table of 41,600 digits developed by L.H.C. Tippett. In 1947, the RAND Corporation generated numbers by the electronic simulation of a roulette wheel; the results were eventually published in 1955 as A Million Random Digits with 100,000 Normal Deviates.

John von Neumann was a pioneer in computer-based random number generators. In 1949, Derrick Henry Lehmer invented the linear congruential generator, which was for a long time used in most pseudorandom number generators. Today, most generators in use are based on linear recurrence (for instance, the Xorshift family). With the spread of the use of computers, algorithmic pseudorandom number generators replaced random number tables, and "true" random number generators (hardware random number generators) are used in only a few cases.

Almost random

A pseudorandom variable is a variable which is created by a deterministic procedure, often a computer program or subroutine, which in most cases takes random bits as input. The pseudorandom string will typically be longer than the original random string, but less random (less entropic in the information theory sense). This can be useful for randomized algorithms.

Pseudorandom number generators are widely used in such applications as computer modeling (e.g., Markov chains), statistics, experimental design, etc.

Pseudorandomness in computational complexity

In theoretical computer science, a distribution is pseudorandom against a class of adversaries if no adversary from the class can distinguish it from the uniform distribution with significant advantage.[1] This notion of pseudorandomness is studied in computational complexity theory and has applications to cryptography.

Formally, let S and T be finite sets and let F = {f: ST} be a class of functions. A distribution D over S is ε-pseudorandom against F if for every f in F, the statistical distance between the distributions f(X), where X is sampled from D, and f(Y), where Y is sampled from the uniform distribution on S, is at most ε.

In typical applications, the class F describes a model of computation with bounded resources and one is interested in designing distributions D with certain properties that are pseudorandom against F. The distribution D is often specified as the output of a pseudorandom generator.

Cryptography

For such applications as cryptography, the use of pseudorandom number generators (whether hardware or software or some combination) is insecure. When random values are required in cryptography, the goal is to make a message as hard to crack as possible, by eliminating or obscuring the parameters used to encrypt the message (the key) from the message itself or from the context in which it is carried. Pseudorandom sequences are deterministic and reproducible; all that is required in order to discover and reproduce a pseudorandom sequence is the algorithm used to generate it and the initial seed. So the entire sequence of numbers is only as powerful as the randomly chosen parts - sometimes the algorithm and the seed, but usually only the seed.

There are many examples in cryptographic history of cyphers, otherwise excellent, in which random choices were not random enough and security was lost as a direct consequence. The World War II Japanese PURPLE cypher machine used for diplomatic communications is a good example. It was consistently broken throughout World War II, mostly because the "key values" used were insufficiently random. They had patterns, and those patterns made any intercepted traffic readily decryptable. Had the keys (i.e. the initial settings of the stepping switches in the machine) been made unpredictably (i.e. randomly), that traffic would have been much harder to break, and perhaps even secure in practice.

Users and designers of cryptography are strongly cautioned to treat their randomness needs with the utmost care. Absolutely nothing has changed with the era of computerized cryptography, except that patterns in pseudorandom data are easier to discover than ever before. Randomness is, if anything, more important than ever.

Monte Carlo method simulations

A Monte Carlo method simulation is defined as any method that utilizes sequences of random numbers to perform the simulation. Monte Carlo simulations are applied to many topics including quantum chromodynamics, cancer radiation therapy, traffic flow, stellar evolution and VLSI design. All these simulations require the use of random numbers and therefore pseudorandom number generators, which makes creating random-like numbers very important.

A simple example of how a computer would perform a Monte Carlo simulation is the calculation of π. If a square enclosed a circle and a point were randomly chosen inside the square the point would either lie inside the circle or outside it. If the process were repeated many times, the ratio of the random points that lie inside the circle to the total number of random points in the square would approximate the ratio of the area of the circle to the area of the square. From this we can estimate pi, as shown in the Python code below utilizing a SciPy package to generate pseudorandom numbers with the MT19937 algorithm. Note that this method is a computationally inefficient way to numerically approximate π.

import scipy
N=100000
x_array = scipy.random.rand(N) 
y_array = scipy.random.rand(N) 
# generate N pseudorandom independent x and y-values on interval [0,1)
N_qtr_circle = sum(x_array**2+y_array**2 < 1) 
# Number of pts within the quarter circle x^2 + y^2 < 1 centered at the origin with radius r=1. 
# True area of quarter circle is pi/4 and has N_qtr_circle points within it.
# True area of the square is 1 and has N points within it, hence we approximate pi with
pi_approx = 4*float(N_qtr_circle)/N # Typical values: 3.13756, 3.15156

See also

References

  1. Oded Goldreich. Computational Complexity: A Conceptual Perspective. Cambridge University Press. 2008.

Further reading

This article is issued from Wikipedia - version of the 11/20/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.