back to article Can't quite cram a working AI onto a $1 2KB microcontroller? Just get a PC to do it

Eggheads have devised software that can automatically produce machine-learning models small enough to run inside tiny microcontrollers. Microcontroller units (MCUs) are pretty darn common: they can be found wedged inside everything from microwaves and washing machines, to cars and server motherboards, running relatively simple …

  1. Paul Crawford Silver badge
    Facepalm

    Oh great, we will finally have to argue with the toaster over our choice of bready snacks.

    1. Chunky Munky
      Terminator

      Gratuitous link

      https://www.youtube.com/watch?v=lhnN4eUiei4

      Let it be a warning!

      1. Wellyboot Silver badge

        Re: Gratuitous link

        Stockpile toasters now!, they'll be retro gold in 20 years.

    2. Anonymous Coward
      Anonymous Coward

      "Oh Crap"

    3. Aitor 1 Silver badge

      Non learning

      Plus this comes learned from the factory.. no luck teaching new tricks to the toaster..

    4. Lee D Silver badge

      "Ah.... so you're a waffle man..."

      (Red Dwarf, for those who don't get the reference)

      1. Dr. Mouse Silver badge

        Have an upvote, you beat me to it!

    5. JLV Silver badge
      Headmaster

      I am sorry, Dave. I can’t do that. Your* too fat.

      * yeah, well of course a $1 chip will suck more at grammar than the average online poster.

  2. DropBear Silver badge

    Never mind the model, how did the MCU get access to the _picture_ in the first place?!? Because at three one-byte values per RGB pixel, a 26x26 image fills your 2K RAM completely. There's a reason cameras tend to get attached to Raspberry Pis and not so much to Arduino Unos...

    1. This post has been deleted by its author

      1. TechnicalBen Silver badge
        Coat

        Re: And I do wonder how this would work on a Pi Zero

        You put your trust in a Nematode driving your car.

        Me? I'm walking away from that idea!

        1. Tom 7 Silver badge

          Re: And I do wonder how this would work on a Pi Zero

          I'd steer clear of the pavement if I was you!

          The nematode brain modelled as a NN managed character recognition using considerably less neurons than 'standard' methods. I was alluding to the idea that AI will probably come to be made from 'brain units' that are sort of pre-wired NN that perform certain 'brain functions' from which we can produce far more reliable AI and I thought atmega328p might be worth looking at for hosting these components. Looking at the spec though they are on 20mips and a PiZero GPU can achieve 24Gflops for 10 times the price so I deleted the post you were answering to and decided to reply to yours!

          I do however believe that AI is almost pre-nascent at the moment. When people look at actual brains in nematodes, insects and eventually mammals we will be able to identify processes that make up intelligence and behaviour developed and refined over some 600million years of evolution. We're barely modelling the AI equivalent of NAND gates at the moment!

          1. Korev Silver badge
            Coat

            Re: And I do wonder how this would work on a Pi Zero

            Were the data saved in WORM storage?

            1. Tom 7 Silver badge

              Re: And I do wonder how this would work on a Pi Zero

              Where do you think CAST got their name from?

              1. jake Silver badge

                Re: And I do wonder how this would work on a Pi Zero

                "Where do you think CAST got their name from?"

                The Childhood Autism Spectrum Test is rather obviously named ...

          2. jake Silver badge

            Re: And I do wonder how this would work on a Pi Zero

            "I do however believe that AI is almost pre-nascent at the moment."

            Nah. Neuroplasticity says no.

            1. Tom 7 Silver badge

              Re: And I do wonder how this would work on a Pi Zero

              Gurgle goo goo!

              1. jake Silver badge

                Re: And I do wonder how this would work on a Pi Zero

                That was a trifle childish.

          3. Anonymous Coward
            Anonymous Coward

            Re: And I do wonder how this would work on a Pi Zero

            "I'd steer clear of the pavement if I was you!"

            depends whether the server that trained the model is on the same side of the Atlantic as otherwise you might want to steer on the pavement

            1. jake Silver badge

              Re: And I do wonder how this would work on a Pi Zero

              Steers belong in the pasture, not on the pavement.

              1. quxinot Silver badge

                Re: And I do wonder how this would work on a Pi Zero

                Steers belong on the plate, not in the pasture.

    2. Wellyboot Silver badge

      The paper give training times as 1-11 GPU days depending on the various methods used. (GPU setup is 4x NVIDIA RTX2080). From this I can speculate that the MCU operation will be very slow process looking at a picture stored in flash.

      It's just an academic exercise at the lowest edge of computing and hats off to them for getting it to work at all on such a small footprint.

      1. Cronus

        Training is a much more computationally expensive operation than inference. Once you have a trained model, getting output given some input is trivially cheap and fast in comparison.

    3. Luke McCarthy

      It doesn't specify which STM32 they used in the paper, and the RAM available varies by quite a lot depending on the model. The highest end model have 1MB, and go as low as 16KB. The image format could have been 8-bit or even 1-bit since it's only character recognition, and the data could have been streamed so the whole image wouldn't have to fit in memory to work. Also some STM32 have a DRAM controller which would allow them to access several megabytes of memory.

    4. fajensen Silver badge

      Maybe by using Stochastic Computing - https://spectrum.ieee.org/computing/hardware/computing-with-random-pulses-promises-to-simplify-circuitry-and-save-power

      Very simplistically, data is striped into serial streams of bits, where randomised long runs of '1's and '0' represent the values. Complex calculations can then be performed on the streams with very simple logic circuitry, thus using very little power. The trixy operation being the randomising process, but, they are fixing it (and a PRNG will still work, it's only that it is complex and burns power).

    5. Jason Bloomberg Silver badge
      Headmaster

      At three one-byte values per RGB pixel, a 26x26 image fills your 2K RAM completely

      That 48x48 icon on the right is just 1,772 bytes, 32x32 and just 928 in the selection options, and quite easily recognisable ->

      It should be possible to decode that into a bitmap on-the-fly and I would guess there are other tricks which could be used if one has plenty of code memory and aren't so fussed about speed.

  3. The Man Who Fell To Earth
    Meh

    Yawn

    I remember in the 1970's programming my Texas Instruments SR-52 with a pickup sticks machine learning program. As you played more games against it, it learned to play better until it became unbeatable.

    One could also write self modifying code (pdf) on this calculator.

    1. Anonymous Coward
      Anonymous Coward

      Re: Yawn

      I once wrote a proof of concept Python program that plays rock, paper, scissors against a human opponent and learns their pattern of responses, so gradually winning more and more often. It would easily have fitted into quite a small microcontroller - in fact 16 bits and about 2k of RAM would do it, 1k if you had a dedicated display and buttons.

  4. Steve Todd

    Blue Pill

    They were probably using these:

    https://wiki.stm32duino.com/index.php?title=Blue_Pill

    Less than £5 from your electronics tat store of chice.

  5. Boring Bob

    What is new here?

    In the 1980's I was program neural nets on micro-controllers smaller than this for audio recognition. I assume they have developed something clever to deserve an article on this website, could someone point out what is new here?

    1. Lee D Silver badge

      Re: What is new here?

      It's now cloud IoT hyper-convergence AI neural nets with fabric... and chintz and...

      Pretty much, nothing's changed in the meantime except we have slightly faster computers but pretty much the end result is no different... we can just "afford" to bundle this junk into little speakers and your search menus where we couldn't before.

      In terms of things actually *learning* or doing useful things, we're still stuck in the absolute dark ages of the technology where really the problem is "unlearning" - i.e. having a machine that trains itself on a million images is all well and good, until you spot something it's doing wrong and you have to basically retrain it from scratch because it has 1,000,000 entries that say it's doing the right thing, and one that says it's doing the wrong thing and it has to resolve that somehow without losing all the subtle nuances that it was trained on (i.e. you can't just weight the error 1,000,000 more than the others).

      Ai, since the 80s and through to today, learns and then plateaus just on the cusp of usability and then *stays there forever*. It's almost a perfect PhD research topic - do it, write it up, hope nobody ever asks you to apply it to anything else that doesn't involve literally starting from scratch every time a change is made.

      And, even then, it's generally only 90-something % accurate which is pretty useless compared to even a trained dog.

  6. heyrick Silver badge

    sports just 2KB of RAM and 32KB of flash storage

    Don't knock devices because their specs appear to be puny.

    The entire Minitel terminal was built around an 8051 device with a 16K ROM, 256 bytes of RAM, and an 8K screenbuffer RAM (for the video chip), and an amazing V.23 modem!

    1. jake Silver badge

      Re: sports just 2KB of RAM and 32KB of flash storage

      Yeahbut ... The Minitel terminal was just that, a dumb terminal. Without the 1275 modem giving it access to the Teletel network, it was just a lump of plastic. Arguably, DEC's VT52 was more capable.

      1. JLV Silver badge

        Re: sports just 2KB of RAM and 32KB of flash storage

        Quite competent at sucking up your $$$ at $5-10/hr on chat lines though.

        First month was a $400 bill. But didn’t happen again.

      2. YetAnotherJoeBlow

        Re: sports just 2KB of RAM and 32KB of flash storage

        I used to write TECO scripts to play games on a VT52 then later the VT100. CPU was a Dec 11/70 and a PDP 10. A good example of a TECO script:

        https://github.com/PDP-10/its/blob/master/src/_teco_/teco.1212

  7. Anonymous Coward
    Anonymous Coward

    Emperors new clothes and neural network humbug

    I can't help feeling that ANNs are essentially a symbol of computational laziness. The basis of rational scientific approaches to problems is to have some understanding of a mechanism - otherwise you are basically doing astrology. 'I know 5 people born in early March who are artists, so Pisces are creative'. Hence adversarial images, unintended biases in decision support systems etc. If a problem can be reduced to a solver in 2kb of ram then a bright human ought to be able to work out what the network is doing, and produce code that beats it for efficiency.

    1. Lee D Silver badge

      Re: Emperors new clothes and neural network humbug

      ANN is just "a statistics based magic box".

      You plug things in. It makes some kind of spurious and random correlation between what you're teaching it and the input data, which you can't really interrogate, understand, improve or modify.

      If you plug enough things in, it might train itself enough to work a percentage of the time. Then untraining, retraining or anything else? It's pretty much throw it away and start again - it's existing "superstition-based" intuition will trip over every exception to the point that it becomes painful to make any significant change after the learning plateaus.

      It's been an unsolved problem for decades, we just get to throw more and shinier hardware at the same problem.

      You would otherwise notice, for instance, that Google searches would become personalised, almost-psychic, tailored to every user... Siri would know what you want before you ask it... because millions upon millions of users are training it daily and it doesn't sleep - it should be learning exponentially. It's not. And over time everything you interact with that has "AI" would get better and better... it doesn't. Improvements are microscopic at best after the initial plateau.

      Literally Siri and Google should be Skynet by now. The reason they aren't... AI and ANN in particular just doesn't work like that.

      What we need is a really, really radical re-think of the whole thing with something entirely different. Because genetic algorithms are the same, ANN, everything we try hits the same problem. And for every million images you train it on, it takes a significant percentage of those images again to retrain it for the ones it gets wrong. And again. And again. With diminishing returns.

      1. DCFusor Silver badge

        Re: Emperors new clothes and neural network humbug

        I get the idea Lee has a clue here.

        Somewhere around the '90s, Timothy Masters pointed out that you really wanted a sigmoid type activation function, else one neuron becoming wildly activated "too sure of itself" would cause errors.

        As would using more than at most 4 layers, which is enough to solve any problem soluble by MLFF neural networks. He cautioned strongly about overfitting due to too large a network for the available training data, and even gave examples of the types of errors you get when you don't follow that advice.

        While not as dramatic as mistaking turtles for guns and some of the funnier examples we see today, I kind of grin at the naivete of the new people discovering this all over again - (or failing to discover anything because they didn't learn the underlying math and intuition required).

        So, now we have to go "deep" - too many layers. To do that, even on new hardware, we use RELU activation functions (training seems to be faster) - so we return to the "too sure of itself errors" predicted. And with too many layers and too many coefficients we now overfit even more, essentially committing every mistake Masters (and no doubt others) warned of and getting exactly the results predicted by them almost 30 years ago.

        At best, these things are classifiers = the rest is pure hype. When over-fit and given noisy data...we see the results, I need not go on.

        This is not to say that you can't get useful results in a small CPU. It's just that you do better when you pay attention to how things work, and don't get distracted by quick training times that produce the odd lucky guess - because a lucky guess is all that is, not real world day in and day out performance.

    2. Mike 16 Silver badge

      Re: Pisces Artists

      Just FYI, my wife is a Pisces, and an Artist (in the sense of "Sells her stuff in a few galleries, albeit some orders of magnitude less expensive than a Damien Hirst slowly-rotting shark". OTOH, Hirst is an Aries, so maybe the Rams get the Big Bucks)

      As for "code that beats it for efficiency": Not gonna happen as long as managers approve the purchase orders for snazzy software tools and hold mere code-jockeys in lower regard than phone sanitizers. Remember that Time to Market is the _only_ metric.

  8. jake Silver badge

    Machine Learning in 2K of RAM?

    SAIL just called from the early 1960s. They want their code back.

  9. AceRimmer1980
    Terminator

    Cautionary Tale

    "And the dish ran away with the spoon", by Paul Di Filippo

  10. martinusher Silver badge

    There's this book.....

    ...on my shelves about DIY Artificial Intelligence (or, more accurately, Machine Learning). It describes predictive algorithms and gives you some to try. Since it was published in 1983 or 1984 the examples it gives are in some early form of BASIC. It does explain in the introduction that, yes, researchers tended to use exotic logic programming languages you don't actually need that stuff to demonstrate how these algorithms work and even get useful results out of them.

    There's wisdom in that text.....I've often thought that trainee programmers should be given small, slow, computers to work with so that they get a bit of a feel for how code gets executed.

    1. heyrick Silver badge

      Re: There's this book.....

      I completely agree. I believe that courses teaching programming (of any serious fashion) should include a stint with something like a BBC Micro (Apple2 for Americans), something where you can actually probe and observe every single signal to understand how it actually works. None of this "little black lump of magic".

      It's also a good exercise in demonstrating 20 bit (five byte) floating point numbers on a machine where the processor has only two registers and an accumulator, no FP, no multiply, and treats everything that isn't an address as an eight bit value.

  11. _LC_ Bronze badge
    Boffin

    2 KB - I wonder

    These cost less than a buck and a half (including shipping!):

    https://www.aliexpress.com/item/Free-Shipping-STM32F103C8T6-ARM-STM32-Minimum-System-Development-Board-Module-For-arduino-CS32F103C8T6/32981849126.html?spm=2114.search0104.3.15.b0b74af3upXz8G&ws_ab_test=searchweb0_0,searchweb201602_6_10065_10130_10068_10547_319_317_10548_10696_10190_453_10084_454_10083_10618_10307_10820_10821_10303_537_10302_536_10059_10884_10887_321_322_10103,searchweb201603_52,ppcSwitch_0&algo_expid=14d34b07-e1db-4245-991a-34b7b9433584-2&algo_pvid=14d34b07-e1db-4245-991a-34b7b9433584&transAbTest=ae803_5

    STM32F103C8

    ARM®32-bit Cortex®-M3 CPU Core. 72 MHz maximum frequency,1.25 DMIPS/MHz (Dhrystone 2.1) performance at 0 wait state memory access. ...

    64 or 128 Kbytes of Flash memory. 20 Kbytes of SRAM.

    2.0 to 3.6 V application supply and I/Os. ...

    Low-power. ...

    2 x 12-bit, 1 μs A/D converters (up to 16 channels) ...

    DMA.

  12. Aussie Doc
    Coat

    Proof at last...

    I wonder if this sort of information will be given as the 'evidence' to show that Huawei can't be trusted because the tech exists therefore it must be so.

    I'll grab my tin foil hat whilst I wait.

    /s

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019