Paris Nelson ’14 – “On Random Number Generation”
Paris Nelson graduated from Southwestern University with a Bachelor of Science in computer science with a minor in mathematics. His paper, “On Random Number Generation,” was written as part of former Associate Professor of Computer Science Suzanne Fox Buchele’s computer systems course. Paris plans to be a programmer in the video game industry.
Random number generation is an important process that affects many aspects of our lives. True random number generators rely on physical processes for true randomness, while pseudo random number generators depend on algorithms or formulae. Tests can be applied to random number generators to measure the randomness of the sequences of numbers generated. Various tests were applied to several random number generators, with results discussed. The security of random number generation is also important, especially in the field of cryptography. The original paper can be found in its entirety here. [LINK?]
Random number generation is the process by which a sequence of numbers are generated “randomly”. This is defined to mean that given a range of numbers generated, every number in that range is represented equally and evenly: that is, each is represented with equal probability and that they occur with an even distribution (no predictable patterns or long sequences of consecutively repeated numbers, e.g. “33333”).
Random numbers are utilized in a diverse range of applications from cyber-security to video gaming. The major source of interest in random number generation is cryptography. Modern uses of cryptography include encrypting sensitive information, maintaining the privacy of communication between entities or stored data. Encryption protocols currently in use rely on the ability to generate random numbers. The security of the encryption depends on the unpredictability of the random numbers used. If the sequence of random numbers used is too predictable, then an attacker could feasibly break the encryption. Random numbers are also useful in mathematics, with one example being the Monte Carlo method of integration. Given an area A that cannot be easily integrated, choose a known area B that encapsulates A, and choose a number of random points in B, noting how many of them also occur in A. We can then use the ratio of the number of points occurring in A and the number of total points to relate the area of A to the known area of B. Here, the predictability of the sequence is immaterial, but the method depends on the uniform distribution of the numbers. Random numbers also have other applications to video games, sampling, modeling, and other things. As can be seen, the quality of randomness required varies between applications; some require that the sequence of numbers generated be unpredictable, some require that they be evenly distributed, others require that the numbers don’t repeat too often.
Pseudo-Random Number Generators (PRNGs) are methods of generating numbers that appear random but in actuality are not. They are generated by some sort of predetermined algorithm, such as Linear Congruential Generators (LCGs), which are relations of the form Xn+1≡(aXn+c)(mod m) where X0 is called the “seed” value and a, c, and m are constants. The seed value is a random or varying value, for example the time at which the algorithm is invoked. Although the input is random, given the constants and a term in the sequence of outputs, all following outputs can be computed. LCGs are one type of PRNG but they exemplify the key features common to all PRNGs: some seed value is required which is then input into a deterministic algorithm, which outputs a deterministic sequence of numbers. While the sequence of numbers that are output from the algorithm may seem random, each number is not guaranteed to be produced with equal probability, and the distribution of numbers is not guaranteed to be evenly dispersed (for example in an LCG, if the constants a, c, and m are poorly chosen). Furthermore, PRNG’s have cycles in their output; after some number of invocations, the outputs will begin to repeat. The maximum number of iterations before this occurs is known as the period of the generator. For an LCG, the length of the period is based on the choices of constants a, c, and m. These features can cause problems if the application requires a higher quality of randomness in the outputs, such as in cryptographic communications: if an attacker can determine the constants used and a single random output, they can calculate all future outputs for the random number generator. Or if a PRNG is used to take a random sampling of data, an uneven distribution in the generated numbers would cause a bias in the sampling results. In both examples, any cycles in the numbers could be detrimental to the results.
The problem with PRNGs is that computers are incapable of generating truly random numbers, since algorithms can only create deterministic sequences. Therefore, randomness needs to be derived from an outside source. True Random Number Generators (TRNGs) are so called because they derive, at some point(s), random input from physical phenomena. Multiple examples are provided in detail in the full paper. Many TRNGs use an externally derived value (e.g. the time) to determine a sampling place/time from the main physical source of randomness. Some TRNGs use quantum effects, which are inherently unpredictable; another, called HotBits, uses radioactive decay (again, an innately probabilistic phenomenon) to generate random bits; yet another site, random.org, employs the variability of atmospheric noise to generate random numbers.
PRNGs have several benefits. They are simple to design and employ, and while they can be done incorrectly, a simple LCG is very practical to create and is not computationally taxing. Without focused statistical analysis, the outputs appear random and unrelated, so for any casual application where true statistical randomness is not needed, such as a video game, or providing sample input to test a function, a PRNG is a good fit. However, for any sequence of numbers in which equal probability of occurrence, even distribution, and unpredictability are crucial, then TRNGs are the more appropriate choice.
A key issue in the development and application of random number generators is security, specifically resistance to attacks. When PRNGs are used, an attacker might be able to intercept a value in the sequence and discern the constants used and be able to predict the future values of the sequence. In the case of TRNGs, an attacker may be able to tamper with the physical source of randomness; in the example of a random number generator based on atmospheric noise, an attacker could potentially generate noise at the site of generation, essentially allowing them to gain some control of the output sequence. A TRNG based on electrical sources/phenomena could be intentionally disturbed via an electromagnetic field to bias the output. While no generator can be perfectly attack-proof and new methods of attacks are always being devised, random number generators, true or pseudo, have to be designed with attack detection and prevention in mind.
Because randomness is defined statistically, it follows that the tests for qualifying randomness are based on statistical concepts. Two test suites are the DIEHARD suite designed by George Marsaglia, and a test suite developed by the National Institute for Standards and Technology (NIST). The DIEHARD suite was published in the 1990’s and was a standard until it was supplanted by the now de facto standard NIST suite (which notably overlaps with the DIEHARD suite in a couple of tests). Between the two standards for testing random number generators, there is an abundance of testing material with diverse foci. It was from these suites that the tests described and applied below were taken.
The simplest of these tests is the Monobit Frequency Test, which, given a string of generated bits, calculates the percentage occurrence of 0’s and 1’s. Given a string sufficiently long, the percentages will even out in a perfectly random bitstring. A similar test evaluates the frequency of occurrence of longer substrings (e.g. all two-bit and three-bit strings). Another test is the Runs Test, which tests the frequency and length of runs, or strings of consecutive identical bits. The idea behind this test is that while runs will naturally occur, if they occur too regularly, follow patterns, or are too long, then this is a sign of pseudo-randomness. Ueli Maurer of Princeton University proposed another test, which he called the “Universal Statistical Test,” a test based on the idea that any truly random sequence cannot be significantly compressed. If a sequence is able to be compressed (beyond some threshold), it’s indicative of patterns, redundancies, or abnormally high length/frequency of runs. Maurer deemed it a universal test because he claimed that his single test would be capable of recognizing symptoms of non-randomness that would take multiple other tests to catch.
Two PRNGs and two TRNGs were “put to the test” via the Monobit Frequency Test (MFT), the Runs Test (RT), and the Poker Test (PT), which is a substring frequency test, using substrings of length 5. The PRNGs chosen were JavaRandom, a Linear Congruential method used in the java.util.Random class and SecureRandom, another Java random number generator, but designed with a focus on cryptographic security. The TRNG’s used were the online generators HotBits and random.org, both described previously. These random number generators were chosen because they were easily accessible and could be used to generate bits directly and in large quantities. Each of the 3 tests was run on each of the 4 generators, producing 16,000 random bits for 5 different test iterations. HotBits performed the most consistently of all the generators, but its performance was average compared to the others. SecureRandom fluctuated wildly in the results, despite the author’s preconception that it was the best, considering it is intended to be appropriate for cryptographic applications. A detailed explanation of the results, including numeric averages, standard deviations, and the parameters used, are provided in the full paper.
The Monte Carlo method was also used to test the random sequences generated. The integral estimate from the Monte Carlo method of integration was compared to the known value of an integral. This allowed an evaluation of the distribution of the random numbers generated because if the outputs were concentrated in one location, that would influence the resulting area estimate. JavaRandom produced the closest estimate to the actual value of the integral, with random.org only slightly behind while SecureRandom produced the estimate furthest from the integral’s value, a surprising result. Again, specific details regarding the inputs used, parameters/methods of testing, and results are provided in the full paper.
Random numbers are crucial to several facets of everyday life. There are multiple ways to generate random numbers, but all methods must consider several factors, including efficiency, difficulty to design/compute, the quality of randomness provided by the output, security, and the specific needs of the application.