The problem with random number generators

It is a commonly accepted fact that computers by themselves cannot generate
truly random numbers, and so most software relies on pseudo random numbers.his means that encryption and other applied uses of"random" numbers may not be as secure as users think. However, it ispossible to expand the boundaries of generating random numbers using acomputer. The most common way to do that is to use indirect input, suchas the system clock. Other methods rely on direct human input.Google provides an interesting approach to generating random numbers through indirect input. Since the Internet isconstantly changing, with Web pages and domains being continuously created,modified, and deleted, databases such as Google provide an extremely
large dataset from which to seed a random number generator. In this context, a seed is a number input as a starting point for the generator's algorithms.Many patterns in the data
could be used to generate a seed, but theymust vary enough to produce true random numbers and the generation must besimple. Since Google's GUI is too uniform (the best seed is
constantlychanging), it is preferable to find a common denominator within thepage results that the search engine returns, from which to generate theseed. Every page of Google query results is different, as Google filters outduplicate pages, thereby [1]providing a dataset of unique elements.Most Web pages contain text, and this text differs from page to page, sopages returned by Google should be an excellent source for extracting seeds.If this assumption is incorrect, then stepping outside of the algorithmwould provide an even more random seed. One numeric pattern found intext is how many words there are. By executing a word count on the text ofeach page
resulting from a Google search, you can theoretically producea random seed.However, Google does not return randomly chosen result pages, butinstead produces search results based on its Page Rank equation. If the samequery were to be submitted
twice, it would most likely produce the sameorder of pages, which would make our generated seeds too predictable. Abetter algorithm might look at the lowest page of results and/or
searchfor different terms on each execution.To further avoid predictability, the algorithm should only make uniquequeries,
which would require a queue of search terms. Extracting oneword from each of the result pages and appending it to a word list easilycreates this. There are several problems with this method. First, more commonly usedwords will populate the list in larger numbers and are likely to beabout the same topics. Second,storing the queue on the computer would makeit simpler to calculate what number will be generated next (a largeimplementation flaw). Third, dependency on the Internet would preventnon-connected devices from generating random numbers, and the speedwith which connected devices could generate results would be dependent ontheir Internet connections (to compensate for slow page hosts, atimeout mechanism could be employed). These flaws mean that implementation is more of a neat idea than afeasible tool.
The [2]Google Random Number Generator that I wroteprovides a simple example of this method of generating random numbers.User input sers have to type to interact with most computers. Typing provides aconsistent medium from which to generate seeds, if the seed generatorwere to calculate the time difference between keystrokes and store itas the seed. Since every user must log
into the system, logins provide aperfect choke point to generate seeds.The process could easily recalculate a seed each time a user logs on(by replacing the old seed, adding the two seeds, or some other method).This would also allow the
seed to be generated before the random numbergenerator requires it. Similar systems are [3]already in use, thoughthey often combine other pseudo random sources such as the system
clock. Then what?
Once you've generated a seed, by whatever means, you must store itsomewhere on the system
so that it can be called upon to initiate therandom number generator (seeds are not generated at the same timerandom numbers are). It would not be sound to store the seed on any datastorage device, including removable media, as other
users could access theseed, allowing them to predict the random number that is generated.Therefore, the seed should be stored in protected memory.In order to access the protected memory and write the seed to it, aseed generator must
run with system level access, which is easily done bywriting a Linux or BSD system kernel module. Once the kernel has storedthe seed, it's a simple matter for a client program to
request the seedfrom the kernel and then pass it to the random number generator, whichwill run the seed through its algorithm.
If one cannot use a kernel module due to insufficient access or theinability to write a module, one could do the same job with aconstantly running process (service) that acts like a server, whose sole purposeis storing and returning the seed when it is called by another program.While running with a UID of 0, the client would use interprocesscommunication mechanisms to request the seed from the seed service.Theseed service could be started at system boot, most likely through[4]init.Final thoughts Pseudo random numbers may be good enough for some purposes, butapplications that need to be secure, like SSH or [5]GnuPG, must havetruly random seeds.It would be a simple matter to employ one of the above algorithms, orsome other one,
to generate truly random seeds in the operating system.Even establishing a standard hardware device to assist in thegeneration of random numbers would be preferable
to the pseudo random mechanismsin place today. While a hardware approach would likely be lesscost-effective for both the computer manufacturer and consumer,most ofthe computer friendly user market would likely accept the tradeoff.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s