back to article Belgian boffins breed 'digital canaries' to test your random numbers

Belgian boffins have proposed adding what they call “Canary Numbers” to random number generators (RNGs), in the hope and expectation they will fall off the twig if something goes wrong. In this International Association for Cryptologic Research (IACR) paper, Vladimir Rozic, Bohan Yang, Nele Mentens and Ingrid Verbauwhede write …

  1. Unep Eurobats
    Coat

    But can you trust a canary

    not to tweet?

  2. jake Silver badge

    Uh ... harmonic discordance here ...

    Adding more input for crackers is never a good idea, from a security point of view.

    1. Tomato42
      Boffin

      Re: Uh ... harmonic discordance here ...

      I don't think the idea was to expose this numbers anywhere near end user code, and rather have two modules - one HWRNG and one verifier. This way you can have independently designed RNG and the thing thing that performs the runtime checking.

      This is rather good idea, as we know how to design whitening functions that pass all statistical checks on the output while fed no entropy at all. In other words, RNGs passing statistical tests doesn't mean it's a good RNG, it just means it's not horribly broken.

  3. TRT Silver badge

    I won't even begin...

    to try to understand this article. In terms of simplifying the aims and achievements of the researchers for the bears of small brain, this one's fallen off the twig.

    1. Dan 55 Silver badge

      Re: I won't even begin...

      There are two sets of random numbers generated, one with high entropy and one with low. It checks the statistical distribution of the low entropy set (the canary numbers). They should be evenly distributed, like you'd expect repeatedly flipping a coin to turn out roughly 50/50. If they're not then something's up. In theory the canary number will be affected first if there's something up with the RNG because its entropy is lower.

      1. Mage Silver badge

        Re: In theory the canary number will be affected first

        Like a Zener diode thermal noise source getting modulated by power supply noise (SMPSU ultrasonic or mains hum).

        A Zener diode feeding an ADC sounds like a nice idea for a RNG, but it's actually quite hard to do it properly.

        1. This post has been deleted by its author

          1. JeffyPoooh
            Pint

            "...radioactive decay."

            Symon says, "...radioactive decay. It's easy to make one at home!"

            Increasing the rate of random bits might cause other issues. Like body parts falling off...

      2. Arctic fox
        Thumb Up

        @Dan 55 Thanks for that old chap. Much obliged, clarified the issue for this bear of limited...

        .......brain in the area concerned at any rate. See icon.

      3. Doctor Syntax Silver badge

        Re: I won't even begin...

        "There are two sets of random numbers generated, one with high entropy and one with low."

        I get that. But why should the latter tell you anything about the former? Or, given that they're talking about TRNGs based on noise sources, how do you get two sets of numbers out of them other than by diverting every other number (or some other percentage) from one set to the other in which case how do you have different entropies for the sets?

        In short, this seems like a remarkably low content article.

        1. Anonymous Coward
          Anonymous Coward

          Re: I won't even begin...

          Probably by way of skimming only a low bit count out of the generator to get your low-entropy set (say your standard set uses 32 or 64 bits, but your canary set only uses 8 bits). A 32-bit number has a numerical range of over 4 billion while an 8-bit one only has 256. Analyzing a stream of 8-bit numbers should be much quicker than scanning a stream of 32-bit ones. You use the first set as an early warning system.

          1. Doctor Syntax Silver badge

            Re: I won't even begin...

            "Probably by way of skimming only a low bit count out of the generator"

            You still have the problem of ensuring that the low bit count numbers repeat any patterns in the high bit count. To take an extreme example you take the low 8 bits, they look random but the top 8 bits are cycling through a short repeated sequence.

      4. Bill Gray

        Re: I won't even begin...

        @Dan 55: Ermm, I'm pretty sure this isn't what they're doing. (See my post below.)

        The second (low-entropy) set of numbers is the raw thermal noise data, or the timing of radioactive decay events, or the timing of key presses/mouse movements for a desktop computer. That data is semi-random, but will have patterns in it, including (possibly) not having bits set 50/50, or correlations between bits. This raw, low-entropy data is "stirred" (I think the Linux kernel used to do it with a primitive polynomial) and hashed to create a smaller amount of high-entropy, "really random" output.

        The idea these folks are proposing is that you send, say, 90% of your raw data to be stirred and hashed, but send the remaining 10% off still in raw form. That way, if your radioactive event detector is broken and sending all zeroes, you'll see it. (Stirring and hashing can make even a stream of zeroes look random. Keep in mind that the stirring will include the previous state.)

        1. TRT Silver badge

          Re: I won't even begin...

          You lost me at numbers having entropy. And then confused me by talking about thermal noise which I know *is* something to do with entropy.

  4. Anonymous Coward
    Anonymous Coward

    I call dibs

    On calling the Raspberry Pi implementation of this (should there be one) the Tweety Pi.

    1. Captain DaFt

      Re: I call dibs

      OK, but only if you make the alert message, "I tawt I taw a putty tat!"

  5. Bill Gray

    I _think_ this is what they're doing:

    Your average TRNG may work by taking, say, images of a lava lamp (see http://lavarnd.org/index.html; note that they now take images of the noise from the camera). This provides correlated bits that are not entirely random. But if you take, say, a 10 KByte sample and hash it, 100 bytes taken from it will be "random" (in the sense that an outsider, without access to the camera, can have no way to say that a given 100 bytes is more likely than any other 100 bytes.)

    The problem is that if the camera died and started providing all zeroes, or zeroes alternating with ones, or something like that, the hashed output still would look pretty darn random. A while back, I made a TRNG by taking a small motor, attaching an (empty) peanut butter jar to the end of it, and starting it up with some dice in it. I pointed a cheap Webcam at it, and set it up to take the stream of image data and scramble/hash it in the above manner. (Note that I did this out of curiosity. I'm not currently doing anything for which I'd need that level of security anyway.)

    However, I had it send nine out of ten images to the scrambling code, and the tenth to a plain old image file. As long as that image was of a peanut butter jar tumbling dice, I figured things were probably okay. If it was blank or maybe the motor had died and the dice weren't tumbling, I'd know I was getting non-random numbers. I won't swear to it that this is what the authors are contemplating, but it sounds like it.

    In theory, you may be revealing some data if people can interpolate from images 1 and 11 to say something about images 2 through 10, but your sampling rate really ought to be low enough that consecutive images won't be correlated anyway.

    1. Anonymous Coward
      Anonymous Coward

      Re: I _think_ this is what they're doing:

      I once tried a video entropy daemon. Thing was, it's hard to get the noise right even with whitening (try running your stuff through a chi-squared analysis, for example). I also have to wonder how fast such a rig can provide truly random data, which is important for an enterprise setting, especially one dealing with a lot of encrypted data or connections, which is why many have to rely on expensive dedicated devices. Thing is, such devices are difficult to get into to see if they're not malfunctioning, so the only thing you have to work with is the data stream. Doing a controlled sampling seems to be the best solution in that circumstance.

  6. Queeg
    Headmaster

    Not to be a pedant...

    but that's a Blue tit...

  7. anonymous boring coward Silver badge

    "testing the randomness of a large random number"

    I would be REALLY interested to see how that could be done...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon