Reliable way to check the output
Easy. NIST publish a standard definitive random number bitstream. The bitstream from any new random number generator is compared with that and if the two match then it passes.
Do I get some sort of reward?
America's National Institute for Science and Technology (NIST) is looking for public input into its long-running project to improve cryptography. The recommendation NIST's put up for discussion covers the design principles and requirements for random bit generators, and tests to validate entropy sources. It's the entropy …
You joke, and I may get sternly corrected for this...
But some mathematical analyses is possible.
Say I try to zip a text document. I then try to zip some random data generated at the same bit length.
I would need to check the text document against quite a few random strings, but the text document should show its self to be more compressible than a random string <on average>.
This is of cause not true for other things, like video, as it may be compressed in a lossy method.
Data with redundancy in it, is different from true random data, however data with little redundancy is very similar, or arguably indistinguishable from random data.
Our language has a lot of redundant data in it, so we can use that to find patterns, and match it to other exiting patters (that is, unencrypted it). Random data, by it's definition, is suppose to have no real pattern. (For example, if your rng gives out four 1s, four zeros then repeats, on every result, it's not random and giving out a possible signal. See XKCD link above! :P ).
Things like stenography can be detected when comparing to original source (if available else where) etc.
I read one paper, where random strings were used to test against scientific theories. If the random string did a better prediction, your theory would officially be "worse than an absolute guess!". :D
Compression before encryption is good practice to remove excess (entropy lowering) redundancy. But although lempel-ziv compressibility is a good test for high levels of redundancy, the lack of the same is a necessary but not (nearly) sufficient property to judge the input as random.
>Compression before encryption is good practice
I thought I read this may not be true in some cases. Eliminating almost all repeating patterns can be almost as bad as having mostly repeating patterns or something like that for cryptanalysis.
Edit: Answer is even more complex than that has more to do with some implementations of compression in some protocols leaking data. For more nuance explanation. http://crypto.stackexchange.com/questions/15138/how-does-compression-before-encryption-leak-info-about-the-input
But some mathematical analyses is possible.
Well, yes. This has been a perennial topic in computer science (and cognate fields) since there was such a thing; thus we have, for example, von Neumann's 1951 "Various techniques" paper on removing bias and the like. Knuth discusses PRNGs at some length in the second volume of TAoCP. If there weren't any useful analysis to be had, presumably NIST wouldn't be asking for any.
Marsaglia's Diehard collection of tests has been freely available for over two decades.
Anyone working with pseudorandom sequences for any nontrivial purpose ought to be aware of these things.
Personally, if I were going to use entropy-encoding compression to estimate the information entropy of a sequence, I'd prefer a predictive matcher like the BWT or PPM, followed by arithmetic encoding. I suspect (though I haven't tried to prove) that would be less susceptible to missing larger structure in the sequence.
It should be possible to make an analogue random source using the internal noise of a PN junction and turn that it to a bit stream for the job.
But this should be a separate small chip where the design is fully published and anyone with a tin foil hat and scanning electron microscope (what, you did not get one free with the hat?) can grind off the package top and see the chip below matches the published and validated design exactly.
Only by that route can the suspicion of, for example, Intel's random number instruction be avoided (and the somewhat ignorant discussion about its use with other sources, see http://www.theregister.co.uk/2013/09/10/torvalds_on_rrrand_nsa_gchq/ for more).
This sounds right to me --- the sort of device that you can plug into a USB port to read, but made from simple components (capacitors, resistors, transistors) that you can verify (or assemble yourself). I've seen some circuit diagrams but we really need something very simple indeed. People add complexity to circuits by adding clever stuff to ensure random weighting* but this seems unnecessary and adds the kind of circuitry that could disguise randomness-subverting badness.
Maybe what we need is something physical that we can verify by eye -- like a lotto ball machine. We just need something that can generate numbers much faster. Perhaps a shaker full of tiny particles, read by a CCD?
* if you have a random bit stream which is suitable in every respect other than weighting (ratio of 1s and 0s) you can create a perfectly weighted stream from it by sampling non-overlapping pairs. I think it was Von Neumann who invented this - you read bits pairwise, discarding all pairs where the bits are equal. You convert the remaining pairs into 1s and 0s using the code 01->0; 10->1 (or vice versa) and bingo, you have a bit stream balanced perfectly 50:50 into 1s and 0s. This is because if the bits are independent then the probabilities of 01 and 10 are equal, whatever the probabilities of 0 and 1 (and hence 00 and 11, which have unknown probabilities, are discarded).
but made from simple components (capacitors, resistors, transistors)
How about using the microphone input of a computer to sample noise from a diode or transistor wired with a suitable voltage in the reverse direction? The semiconductor noise is supposed to be quantum-mechanical, so unpredictable even in theory.
The speed of getting random bits might be a problem.
This has been done. I raised a similar point some time ago and it is available in the form of a USB 'dongle' that contains a noisy zener diode (if I remember correctly). I can't remember when and in which article someone gave the link to its maker's website.
If you Google "USB random number generator", there are lots of hits. The bits are out there.
Here's a handy list, including the devices you were looking for, and others with an open hardware specification.
You know, I tried the shaker can thing once. I bought black and white sand from a craft store, mixed it up good and put it in a wooden frame with a glass window, figuring I could scan it to get some random bits. It was fairly disappointing. It actually came out with small wave-like patterns visible to the naked eye. I guess the two kinds of sand were different weights or textures or something.
Random is hard.
...are very hard to find, outside of the machines used to generate government statistics.
If such a machine is a virtual impossibility, it must have finite improbability, but before trying work out the details, I'm going to need a fresh cup of really hot tea. Anyone got a spare atomic vector plotter...?
I have lots of random numbers to sell. Doing spectroscopy, there are lots of failed experiments where the noise level is too high to do anything useful with the data. Would be nice to get some additional funding by selling some random bits.
If you don't want to buy from me, you can always set up your own random number generator: simply get some decently sized particle detector and start counting.
"Then the spooks start firing fast neutrons at your house, to adjust the bias in your random noise generator." -- AC
LOL but (and this is relevant to some other side-channel attacks) if you use the Von Neumann de-biassing method above, all they can do is slow down your RNG. Bombarding you with neutrons might actually improve the quality of the output, but I guess slowing it down (and hastening your personal End-of-Life) might be an approach if they get desperate!
For the past year, I've been running tests on a couple random number generators. I grab a random 32-bit integer form it, each number is plotted on a 65,536 x 65,536 bitmap. Each time a number comes up, its pixel's color is bumped up by one. Every so often, the bitmap gets saved to an external storage array so that the image can be viewed from another machine, any number that is favored by the RNG will show up as a different color in the resulting image.
Doesn't require all that many resources to run such a test 16 GB RAM to store the results, so another 8 for the OS and the RNG itself would be more than enough. The bottleneck ends up being the RNG itself. So a quad-core system with 32 GB of RAM and 100 MB/s of storage could create an image every 5 minutes. I've been running my tests on just such a machine (A pair of old HP DL360 G5 with a shared HP MSA2000 G3 storage array, the second server is there to share out the images without bothering the testing machine).
I'm working on publishing my resulting in the next few months once I get a statistically significant sample (So far each number is only averaging 40-45 hits). I've only just started, after all. I'm hoping to get some funding to run a bunch of tests in parallel and get some help with collating the data (My work commitments have been taking priority)
Yup. A "heat map" like this can only show you very pathological cases where the RNG is really skewed. Even then, the mind is great at picking up patterns that may or may not be there, so you could be looking at a map and thinking that it looks "unrandom" but really is still within the statistical bounds for what is random.
You're better off doing a chi-squared test if you just want to check that the generated numbers are well distributed. As Adam 1 said, though, this won't help if the number stream has some sort of discernible correlation between terms. Chi-squared is pretty crude, but it's a good sanity check.
"For the past year, I've been running tests on a couple random number generators. I grab a random 32-bit integer form it, each number is plotted on a 65,536 x 65,536 bitmap. Each time a number comes up, its pixel's color is bumped up by one. Every so often, the bitmap gets saved to an external storage array so that the image can be viewed from another machine, any number that is favored by the RNG will show up as a different color in the resulting image." -- Crazy Operations Guy
It's a good first attempt at RNG visualization but I'm afraid it is rather flawed: a quick example will show why: what if you replaced your RNG with a counter? It is lack of correlation between one bit and the next (more exactly that any given bit in no way depends on any of the previous history) that is the crucial thing rather than a completely even coverage (as explained in my earlier post you can extract a smaller number of perfectly distributed random bits from an imperfectly distributed random source as long as each bit is independent.
I'm addressing the issue of randomness vs distribution in my next by measuring the percentage of cache hits when the processor is updating the counts. Since the counts are stored at memory address: 0_<prefix><32-bit Number>00, numbers within that same block of 3 million results (The count for each number takes 4 bytes of RAM and the processor has 12 MB of cache).
In the case of sequential numbers, I'd theoretically see a cache hit-rate of 99.99997% on the second chip. I'd also see a block of a specific color stream across the images as I'm checking them. I'm also looking at the intermediate images to see if there are any purely uniform spots. I expect to see a small amount of deviation in each image.
But then, any sort of sequential chicanery would be noticed almost immediately by anyone running a basic check on the RNG, what I'm focusing on is long-term randomness of an RNG.
1) Optofeedback randomly selecting Youtube videos, XOR32'd with each other
2) Noise in a Josephon junction array a hair below Tc
2) should be doable with simple laser cooling, you can get well below LN2 temps now thanks to CdS based compounds.
I did often wonder whether the "cool-chips" technology got snapped up by the "Powers-that-Be" for use in their quantum computer and laser systems.
My research suggests that D-wave is just the tip of the (extremely cold) iceberg, making a quantum system work at more useful temperatures is an ongoing research project of mine.
See published work...
Very funny!
FWIW this is the same AC who discovered the row leakage effects in DDR3 months before Google discovered it, at least publicly but didn't realize *why* it was happening.
Itunes is actually really good at rooting out flaky memory, also there seems to be a link between this effect and x520 machine only working with certain RAM modules and not others so it could be a BIOS issue leading to spot overheating in use exacerbated by voltage fluctuations.
I so wish I'd published a paper on this, kept the modules for reference and they work correctly on another machine so it would seem to confirm my hypothesis.
I've got some test pellets here that show clear signs of room temperature superconductivity but no serious researchers want to even consider that this could be a breakthrough of Gaussian proportions, especially as the effect only needs off the shelf components and simple lab chemicals to replicate. aka HOPG, lithium and MEK/acetone. 10-20% resistance drop!!
This post has been deleted by its author
A friend and I had an idea for an "infinite adventure game" (text based dungeon) for which we needed an infinitely long, yet reproducible, random sequence. We were going to use decimal places of pi, starting at a predetermined place, given that pi is easy to calculate and (afaik) has passed every test of randomness thrown at it. I guess they don't want it to be reproducible, but I wonder whether a similar method could be used?
The word "random" has many different meanings and people tend to confuse them. In this particular case, cryptography, the meaning we care about is "cannot be predicted by an attacker". So digits of pi are no good. And data derived from a quantum mechanical process are no good, either, if there's any possibility that an attacker might be able to observe the same process or intercept the data. So you're better off with crap generated inside the server box (such as a microphone picking up noise from the fan) than with any high-quality evenly distributed quantum-mechanical stuff that comes in over the (insecure) network.
It wouldn't take long to find a sequence of digits of Pi... Google Bayer Moore algorithm. In fact if you can get any chunk of a one time pad with moderate entropy it's not that hard to search a large canon of alphanumeric sequences. Google "no very favourable idea of the age" and you'll find Austen's Northanger Abbey in no time.
use a random number generator to choose which part of PI to use
But then it fails another test of an RNG that's suitable for crypto uses: it'll be susceptible to timing attacks, assuming that you have to calculate the chosen bits on demand.
Of course, if you have enough disk space (we're talking Terabytes), you can pre-calculate the digits (and somehow make sure that seek times don't allow for a more subtle timing attack), but then it fails the practicality test.
"So you're better off with crap generated inside the server box (such as a microphone picking up noise from the fan)"
The microphone might also pick up the voice that you are trying to encrypt, and the fan (or other part directly connected to the power supply) will vary in sympathy with processor load, potentially leaking entropy or even clear text into the random generator, compromising the encryption.
> And data derived from a quantum mechanical process are no good, either, if there's any possibility that an attacker might be able to observe the same process or intercept the data
At a quantum level, the observer would collapse the state. They can't passively observe because cloning is impossible.
https://en.m.wikipedia.org/wiki/No-cloning_theorem
Quantum == weird
"...infinitely long, yet reproducible, random sequence."
The nice thing about infinities is that they can be endlessly compressed. Unfortunately, they're still infinite.
Here, look up some YouTube videos on this.
https://en.wikipedia.org/wiki/Banach%E2%80%93Tarski_paradox
Start with Mathologer, then Vsauce.
Bring a bucket to carry home the exploded bits of your brain.
Only a tenuous connection to this thread, but still very entertaining.
"SP 800-90B is one of three documents in the 800-90 series: 800-90A specifies random number generation algorithms* ([purportedly] aiming, along the way, to avoid a repeat of the Dual_EC_DRBG scandal exposed by Edward Snowden's document dumps), while SP 800-90C specifies how to put entropy sources and random number algorithms together at the system level."
*sic
2nd line of article, my emphasis:
"The recommendation NIST's put up for discussion covers the design principles and requirements for random bit generators, and tests to validate entropy sources."
Gideon 1, my emphasis
""Verification not generation ... It's interesting how many Commentards didn't understand the article."
Errm, yes?
On a more serious note, given the difficulties in verifiability (not just doing it, but doing it in a way that is widely understood), I think verifiable generation (quantum & other physical methods proposed above by commentards including myself "who didn't understand the article") would be a better approach than new methods for verification. Given that there are any number of deterministic sequences (e.g. digits of pi, mentioned above) that satisfy all existing tests for randomness and (as far as my limited mathematical understanding goes) are likely to continue to do so, verifiable generation seems to me a much more promising area than verification of deterministic generators.
I love the fact that generating true randomness is a very difficult problem to solve. Also, if you >think< you've solved it, it's hard to verify.
Going out on an existential limb, it could be that true randomness is impossible because there's an innate order to the universe that can't be broken....
(off to take my meds now)
>Going out on an existential limb, it could be that true randomness is impossible because there's an >innate order to the universe that can't be broken....
You're in good company. For Albert Einstein's famous statement: "God does not play dice with the universe" to be true, randomness where we think we've found this, would be an emergent as opposed to an intrinsic property of the universe.
This post has been deleted by its author
This post has been deleted by its author
This post has been deleted by its author
>I love the fact that generating true randomness is a very difficult problem to solve.
Maybe artificially but radioactive decay makes finding true randomness fairly trivial.
>randomness where we think we've found this, would be an emergent as opposed to an intrinsic property of the universe.
It is an intrinsic property of the universe see quantum tunneling (which we often use for looking at things at the atomic scale) which explains why it is impossible to have a true closed system in our universe. An outside particle can always tunnel in at any time randomly.
"radioactive decay makes finding true randomness fairly trivial" - Just because you can't predict it, that doesn't make it random. First person to predict radioactive decay gets a Nobel Prize and invalidates the whole of quantum theory, exposing the secrets of everyone who'd invested in radioactive RNGs would be the icing on top.
Solar axions being linked to modified radioactive decay in certain rare isotopes (54Mn comes to mind) is consistent with a minimally modified Standard Model (MMSM) however a lot of contradictory data exists here suggesting two different effects at once. (or two axion domain walls interacting)
See http://axion-wimp2012.desy.de/e102694/e102699/e163667/Sturrock_P.pdf
Its also possible that this could be an as-yet-undiscovered particle also of solar origin but that would virtually invalidate the Standard Model as it would also imply an interaction with something other than gravity.
Perhaps the mysterious 750GeV "OMG" particle CERN may or may not have found is actually something and not a mere statistical anomaly.