back to article RIP ROP: Intel's cunning plot to kill stack-hopping exploits at CPU level

Intel is pushing a neat technique that could block malware infections on computers at the processor level. That's the 40,000ft view of the new safety mechanism, the details of which were published on Thursday. What's really going on is this: Intel's so-called Control-flow Enforcement Technology (CET) [PDF] attempts to thwart …

  1. a_yank_lurker

    Interesting

    Sounds like a good idea. But a lot of interesting ideas have failed miserably.

  2. Charles 9

    Is there a reason no one's tried to introduce a guarded stack: one that can ONLY be manipulated by PUSH's and POP's such that any attempt to smash or otherwise alter it throws an exception? If you can flag a "shadow stack" as protected memory, why not just flag the ordinary stack as protected?

    1. inmypjs Silver badge

      "why not just flag the ordinary stack as protected"

      Because in (almost?) all programming language implementations the stack also contains data (local variables).

      I don't see why a complier can't implement separate data and return address stacks and provide the same solution without having to buy more new silicon from Intel. Guess no one thinks it is worth the effort and inefficiency.

      If you are going to throw more silicon or processor cycles at the problem I would think they are better spent on bounds checking to catch overflows before they trash return addresses.

      1. Charles 9

        "Because in (almost?) all programming language implementations the stack also contains data (local variables)."

        And last I checked, there are plenty of alternative ways around that. If the parameters are popped into registers or local memory when the function starts, that gets around it. Passing by register for low-parameter-count functions is an option, too. If this is the price for having a hardened stack, it may be worth paying. As I think about it, do CPUs these days also check for 1:1 stack use by functions (checking that SP at CALL = SP after RET) to guard against stack misalignment?

        What you propose is basically a variant of Intel's idea, BTW (the shadow stack is your call stack). They probably can't do a full separation for legacy reasons since the logic in most CPU architectures is that RET pops the return address.

        As for catching overflows, that's a nontrivial solution since functions may be required to work on items outside of its local context (pointer dereferencing, for example), creating conflicting issues of context. Due to the architecture, bounds checking has to be left to the code itself, especially when speed efficiency is required.

        1. Voland's right hand Silver badge

          And last I checked, there are plenty of alternative ways around that

          Recursion, see recursion.

          1. Charles 9

            A direct push before a register pass still allows for this. It's one reason modern chips keep larger numbers of registers. Excessive recursion will overflow the stack no matter what. There's also concepts like placing parameters in a structure and passing a pointer to it by register (rearranges the parameter transfer a bit but makes for a cleaner stack).

        2. joeldillon

          Passing by register only, only works for leaf functions. If your low-parameter-count function calls another function, where does it store its own parameters and locals while that other function is running? Oh right, the stack.

      2. Kanhef

        The problem with having a software-defined return address stack is that there's nothing to keep malicious code from manipulating it; as far as the processor is concerned, it's just another region of the process' memory. A hardware-defined shadow stack can more effectively restrict access: the processor itself is the only thing that should manipulate this area of memory (as a side effect of call and return instructions), so any attempt to alter it directly can trigger an exception.

        I'm not intimately familiar with x86 instructions (I'd rather be dealing with Power or ARM), but it looks like this could be defeated if there's a way to write arbitrary data to the EIP register. Overwrite EIP, call the next instruction, and you've put your desired return address on the shadow stack.

        1. inmypjs Silver badge

          "The problem with having a software-defined return address stack is that there's nothing to keep malicious code from manipulating it"

          If malicious code is running you already lost.

          The point of separate return stacks is to make it harder for malicious data to corrupt return addresses and so allow malicious code to execute.

          1. Charles 9

            "If malicious code is running you already lost."

            Then the war is unwinnable because you MUST assume malicious code is already running. We need to change the model to assume (like in the real world) the system IS potentially compromised and find a way to keep running DESPITE it.

            1. Anonymous Coward
              Anonymous Coward

              No argument here! So many of the systems I worked on and later designed were critical, potentially resulting in serious loss of life and rather large geographical regions, you had to get it right. Especially in the wrong hands. Always assume, for example that the adversary has complete and accurate intelligence on you systems. Two man rule, now TFA....

              Such considerations are coming to the fore, finally. Provably, rather than probably, correct operating systems are still somewhat out of reach. We think that we have at least a minimal kernel done.

              [I've worked in the nuclear field (huge booms), safety from hazards of electromagnetic radiation to personnel, fuel & ordinance (moderate size booms) and in medicine. Kinda had to make sure I did everything exactly right. And provide for sabotage, enemy action, human stupidity and of course Murphy.]

              1. Anonymous Coward
                Anonymous Coward

                "Such considerations are coming to the fore, finally. Provably, rather than probably, correct operating systems are still somewhat out of reach. We think that we have at least a minimal kernel done."

                But they run right into the demand for absolute speed or the project is not worth it. You can do it fast, you can do it right, but what happens when they demand it be done right & fast at the same time? AND on time AND under budget?

                I think this is the biggest problem today. It's not that people aren't coding secure by design. It's that they're too constrained to do so. This is perhaps the one industry where everyone expects you to deliver a unicorn by yesterday and make the demand with a straight face.

                1. Brewster's Angle Grinder Silver badge

                  There's a XKCD for this, but I can't be bothered to find it.

                  "This is perhaps the one industry where everyone expects you to deliver a unicorn by yesterday and make the demand with a straight face."

                  Look, you have a working horse. How much harder can it be to add one little horn? It doesn't have to have a spiral or anything; a plain horn is fine. Just get me a unicorn and be a bit more positive about it.

                  1. Charles 9

                    Re: There's a XKCD for this, but I can't be bothered to find it.

                    "Look, you have a working horse. How much harder can it be to add one little horn?"

                    You forget the "by yesterday" requirement. Pardon me, but Time Lords are few and far between, and the one we know doesn't have what one would call a stable or always-agreeable personality.

      3. Anonymous Coward
        Anonymous Coward

        Because of how stacks have been implemented in the Intel x86 (and its 64 bit extension) architecture, a compiler can't simply overrule the CPU and implement its own stack(s).

        For example, in some calling conventions - which may be used by external libraries you don't control - stack data are cleared by the callee using a RET n instruction. That implies that both call data and return instructions are in the same stack. Also, when going back and forth privilege levels, stacks are automatically switched by the processor itself. Even switching threads implies switching stacks.

        Thereby, a single compiler can't implement its own stack(s) management and expect it works on an actual OS and processors without issues. Unless it becomes a VM handling everything inside, but performance will drop a lot.

      4. Michael Wojcik Silver badge

        If you are going to throw more silicon or processor cycles at the problem I would think they are better spent on bounds checking to catch overflows before they trash return addresses.

        More expensive, and frequently infeasible in native code.

        You violate the C language specification if you attempt to make it pass bounds information around with arrays, for example. Keep that information out-of-band, in some kind of side table, becomes very expensive indeed - particularly due to its lookup overhead and impact on locality of reference.

        We have languages that do bounds-checking. Some people use them. A relatively small minority can't afford to. Many can't be bothered.

    2. Voland's right hand Silver badge

      And where you will store function call parameters?

      Stack is not only for addressing, it is for function call params too.

      What Intel is proposing is not far off - it is two separate stacks - one legacy - params and returns, and one only for return addresses. The so called shadow stack is very similar to what you are describing - a program can manipulate it solely via subroutine calls and returns.

      By the way, this still leaves a large exploit category not covered - it is of little or no help with heap exploits.

    3. Ken Hagan Gold badge

      The full proposal is quite complicated and has to resolve questions around indirect branches, FAR CALLs, privilege transitions and interrupts and other details of the Intel architecture that, although mostly unused, are still necessary to produce a working operating system. The proposal also works with existing code.

      To summarise, if you had wanted to introduce this feature about 40 years ago, it would have been trivial and (quite probably) implemented purely as a compiler code-gen strategy. If you want to implement it now, it is "quite fiddly".

      (Actually, implementing it 40 years ago would have been a little fiddly as well. Traditionally the heap grows upwards from the bottom of a segment and the stack grows down from the top. A second stack would have to find a third "end" to grow from. Not insurmountable, but enough of a headache that you'd optimise the solution by storing return addresses and automatic variables in the same stack.)

      1. Roland6 Silver badge

        Re: Actually, implementing it 40 years ago would have been a little fiddly as well.

        Whilst not 40 years ago, the i286 did include some rather useful stack handling and memory protection instructions, only catch I could determine was that the only effective way to utilise them was to write in ASM. I suspect that what Intel have proposed here is a set of functions and instructions that are more supportive of higher level languages and hence usable by compiler writers.

        For example the intel i86 memory segmentation model was very easy to use in ASM, but for a compiler writer, you had to vastly simplify it, hence why many compilers either only supported 64K program segement and a 64K data segment, and those that did embrace segmentation limited the models that could be used.

        1. Anonymous Coward
          Anonymous Coward

          The segmentation model was mostly an issue on 16 bit CPUs protected mode because each segment was limited to 64k. That made especially handling large data structure not easy. Using segmentation is not very difficult - the compiler needs to keep track where instructions and data are, and then load the desired segment, or use a "far address" whenever needed. They did already. Compilers do generate CPU opcodes, thereby rarely programmers should use assembly themselves.

          The drawback of segmentation in protected mode is that each access outside the current segment goes through a series of security checks which slow down the execution.

          Thereby it wasn't used in any commercial OS I know. AFAIK, in 64 bit mode segmentation is actually removed. I'm not surprised, because it was designed by AMD which actually has no clue how to make a sound CPU design (fast, maybe, sound, no).

          Proper segment use would have added a more secure layer to OSes and applications. Unluckily, the need for speed is more commercially viable than security. Thus enjoy a single code segment writable, and a single data segments executable (with the NX bit trying to stop it for some pages). Maybe the big new features of 2030 processors will be protected memory segments...

          Most protection instruction are designed to be used only by code running at privileged ring levels (the critical one, at ring 0 only), and they are thereby designed to be used by an operating system kernel, not userland applications. What Intel proposes here is an automatic management of calls return addressed by the CPU, so no changes to existing code are needed.

      2. Brewster's Angle Grinder Silver badge

        "To summarise, if you had wanted to introduce this feature about 40 years ago, it would have been trivial and (quite probably) implemented purely as a compiler code-gen strategy."

        All hail the 6809 and its two stacks: the "system" stack and the "user" stack. Great for forth (the computation stack and the return stack) but useless for much else. It did function as a useful extra index register. though.

        1. Mike 16

          System + User Stacks

          Predate the 6809 by quite a bit. A separate stack pointer (and memory area pointed to) for "kernel" code is part of pretty much any architecture that includes memory management and more than one "mode". IIRC, the higher end models of the PDP-11 had three stacks, somewhat analogous to the multiple rings of the x86 (x>1), and the systems that inspired them.

          As for Segments, the 286 was an interesting attempt to allow the Multics memory model, where each data object was (or could be) its own segment. Alas, the very small number of actual segment registers (and accompanying "cache" register equivalents) meant that there was a lot of flailing, and segment reloads were pig-slow. That and the rush of newly-minted C programmers, who thought that all that verbiage about what references via a pointer (e.g. what could be relied upon to work) were "more like guidelines, really" meant that model was doomed. Still interesting...

          1. Brewster's Angle Grinder Silver badge

            Re: System + User Stacks

            I'm too young to have programmed a PDP-11. (*flutters eyes attractively* - hey, eyelashes are one of the things don't sag with age.) But I'm talking about two stack pointers accessible simultaneously via their own dedicated instructions: PSHS, PSHU, PULS, PULU. We're not talking about a stack in SMM mode or kernel mode that the CPU switches to on an interrupt or at the call gate. Imagine if AMD had introduced rsp2 and push2 and pop2 instructions to go with it.

      3. Michael Wojcik Silver badge

        Actually, implementing it 40 years ago would have been a little fiddly as well.

        It was done 40 years ago. It's just a matter of dispensing with the linear stack.

        Customers didn't want to pay for it.

    4. Michael Wojcik Silver badge

      Is there a reason no one's tried to introduce a guarded stack

      Things along these lines have been done, at least using displays rather than a simple linear stack, on capability machines. I know I've seen articles on the subject, though I can't dig up a reference at the moment.

      The main drawback is the same one we always see with displays and/or capability architectures: performance. The linear stack, particularly a hardware-assisted one like you have with x86 / x64, is very fast. Most customers aren't willing to take that performance hit in order to gain increased security.

      Some are, which is why IBM still sells a lot of the System i (former AS/400) machines.1

      1What about System z? I'm not much of a z assembly programmer, but when I tinkered with it back in the day, it was a display architecture - BALR and associated template code in effect constructing a new stack frame on a non-linear linked stack. But z doesn't protect display frames from rogue code running in the same TCB ("process").

      1. Anonymous Coward
        Anonymous Coward

        Sys/z has, and has had since the days it was called System/370, support for doing something very similar to what Intel just came up with "on their own": A stack just for procedure linkage, modifiable only by control transfer instructions.

        It wasn't introduced as an exploit mitigation though, but rather a way to use the same linkage stack across privilege levels.

  3. Binnacle
    Happy

    Silver Bullet

    A pleasure if this turns out to be the long awaited silver bullet against malware attack.

    Perhaps it deserves the label "Platinum Bullet" considering the engineering effort required to make it happen. A world without botnets, a world with unemployed Russian cyber-criminals, a world where governments can't so easily barge uninvited into everyone's lives--quite a picture.

    1. Voland's right hand Silver badge

      Re: Silver Bullet

      No it will not.

      Heap overrun exploits.

      1. Binnacle

        Re: Silver Bullet

        >No it will not.

        >Heap overrun exploits.

        WRONG

        Heap-overrun exploits are written using ROP code. Heap regions are non-executable and ROP the only way to exploit "use-after-free" vulnerabilities.

        This feature will close the door on the nastiest technical vector employed in attacking remote systems and may "tip" the state of overall Internet security to a much better place. Obviously people are stupid and this will not change, but various shades of code white-listing is gradually mitigating the unwashed morons who aggressively "click through any and all warnings." So far no lasting Apple-product botnets, right? 2FA and password quality enforcement has already made a significant dent in the stupid-password problem. Costly and embarrassing hacks have shamed dumb companies and their programmers into applying proper salted hashing to stored passwords.

        Botnets are the single biggest factor in Internet malevolence and sending them off will end the careers of typical cyber-criminals.

        1. Charles 9

          Re: Silver Bullet

          "So far no lasting Apple-product botnets, right?"

          Only because there aren't that many Macs to go around, but vulnerabilities (NASTY ones at that) still exist. It's just not worth malware writers' time at this point.

      2. Anonymous Coward
        Anonymous Coward

        Re: Silver Bullet

        This isn't about preventing stack overflows - those are all but dead already due to eg. stack cookies and ASLR. Bad Register!

        ROP is used to achieve shellcode execution when all you can accomplish is a jump to a specific address. Frequently resulting from use-after-free bugs for example, when you gain control of the C++ vtable pointer of an object.

        1. Anonymous Coward
          Anonymous Coward

          Re: Silver Bullet

          The "R" in ROP stands for "Return". And last I checked, in just about every processor architecture I can think of, the RETurn instruction involved popping the return address off the stack and jumping to it. Meaning modifying that return address (like with carefully-crafted memory operations to avoid breaking cookies/killing canaries) is a key part of ROP.

      3. Vic

        Re: Silver Bullet

        Heap overrun exploits.

        You've mentioned this a couple of times; I still don't see the relevance.

        To overwrite the PC, you still need to get some data into an area that will be loaded into PC - and that's the return address on the stack. If you should manage to exploit a heap overflow to manipulate the return address on the stack, that address will not match the shadow stack when the RET is executed - so the task will be stopped before it gets to your code. And if you attempt to use that exploit to manipulate the shadow stack, that will generate its own exception, killing the exploit.

        Now there might well be issue as yet unrealised in this proposal - but a simple heap exploit wouldn't appear to be it.

        Vic.

    2. PassiveSmoking

      Re: Silver Bullet

      There is no silver bullet because computers are operated by humans and the fact that "password" is by far the most common password shows you just how badly they fail at security. Until you can give humans a hardened stack there will always be malware of some sort, it will just rely on social engineering and tricking users into giving it the privilege it wants instead of trying to steal privilege with clever programming hacks.

      1. Adam 1

        Re: Silver Bullet

        My password used to be password, but I changed it to dadada.

        1. Paul Crawford Silver badge

          Re: Password

          Mine used to be dadada but now it is ich lieb dich nicht

        2. bombastic bob Silver badge

          Re: Silver Bullet

          "My password used to be password, but I changed it to dadada."

          you TRYING to put that song into people's heads? heh heh heh

          (password, secret, love, sex, money... and GOD. don't forget GOD. system admins *LOVE* to use GOD).

          and there's that OTHER xkcd, something about "correct horse battery staple"

    3. phuzz Silver badge
      Facepalm

      Re: Silver Bullet

      All the technical measures in the world won't protect your OS if the malware can just get the user to click "run".

    4. Pascal

      Re: Silver Bullet

      No Silver (or Platinum) Bullet will ever stop Dave.

      http://i.imgur.com/jIyyeph.jpg

      1. Charles 9

        Re: Silver Bullet

        OK, so how do we fix Dave when You Can't Fix Stupid?

  4. PNGuinn

    "The specification – produced with the help of Microsoft"

    Hmmm ...

    That is all.

    1. Anonymous Coward
      Anonymous Coward

      Re: "The specification – produced with the help of Microsoft"

      They produce a major compiler (so they aren't in the dark) and a major consumer operating system family on which a lot of malicious code will run. What do you want instead, for it to be done without their input & behind their back? But enlightened self-interest happens.

      1. Michael Wojcik Silver badge

        Re: "The specification – produced with the help of Microsoft"

        And they have a major security research organization.

        But you can't reason with people who are determined to be wrong.

  5. Anonymous Coward
    Anonymous Coward

    Thanks for the article

    Easy to follow and very informative.

    1. Thought About IT

      Re: Thanks for the article

      Indeed. It's the sort of article The Register should stick to, rather than those pushing its climate change agenda.

      1. Steelted

        Re: Thanks for the article

        Climate change is real, dude.

        1. Thought About IT

          Re: Thanks for the article

          "Climate change is real, dude."

          Not sure those running The Register agree with you.

  6. William 3 Bronze badge

    DRM for everyone.

  7. Had_to_be_said

    Looks sweet.

    It would seem to actually address the "problem"... incorporated in the code itself... before "bad code" (an insecure, flawed, or poorly-written, program) could be, externally, exploited.

    And, it looks much better than the possibility of industry abuses caused by the "signing" and "Trusted Computing", lock-down, proposals usually put forth (as bolted-on afterthoughts).

    1. Anonymous Coward
      Anonymous Coward

      Re: Looks sweet.

      You fail to understand that signing - and the infrastructure needed to check signature - are there not to protect code while it's been executing, but to prevent on-disk modification. Any protection for executing code is useless if code is actually changed into something bad before being executed.

      Who is the "Master of the Keys" may be debatable, but unless you can trust what you're executing, you won't be protected just by a check of the return address.

      Unless you're among the ones who likes some kind of vulnerabilities because they allow for cracks and use of commercial programs without paying the licenses. There is a lot of free and open source stuff today, use it if you don't like to pay for software. Especially since many cracks are malware themselves.

      1. Anonymous Coward
        Anonymous Coward

        Re: Looks sweet.

        I found some people attitude to security funny. Do they usually allow anybody enter their house/company and then obsessively check they don't do anything nasty, or they first check if they are allowed to enter and move freely? We use "signatures" continuously.

        A safe system obviously comes with some limitation. Defense in depth requires multiple level of checks - even before an application is loaded into memory and the the entry point called. Otherwise it may be already too late.

        Otherwise, please, don't use HTTPS. It's based on signatures too. If you can't trust the other party, encryption becomes useless, you can easily MITM the exchange and you wouldn't be able to know. Don't check hashes of what you download. It's a manual signature check. APT does uses keys and signatures as well. Get rid of them, let APT download and install whatever you feed it.

        Is MS keeping the SecureBoot keys wrong? Sure. It should be an independent organization.

        But of course some are blinded by their activism and political agenda (and often, plain greed) and can't see the elephant in the room.

        Meanwhile security-aware OS and applications does sign and check what they run and download...

        1. Had_to_be_said

          Re: Looks sweet ...without the pseudo-security... of lock-in.

          First, as to "signing"; its "security" is basically verifying that the original code is from "trusted" (I.E. "authorized") sources, without modification. The problem is that if any code, signed, or not, has these types of inherent flaws... which -IS-, and -HAS BEEN-, the case... time, and time, again... then "signing" simply becomes a means of "locking-in" code distribution, to "authorized"... I.E. "licensed"... software manufacturers. And, frankly, those interests who continually push this ("code signing") the most, actually have a long and very-well documented history of ABUSING, exactly this type of LOCK-IN. As well as, having an absolutely terrible history of releasing, and "signing" such flawed code, over, and over, again. So, there is -no- real "security" there, at all.

          In short, flawed, (but, officially-authorized) code -IS- one of the major sources of software security compromises. And, this particular "security" fix [hardware-based STACK Bounds-checking] is aimed directly at protecting against such "flawed code", itself... without allowing a clearly demonstrated platform for further commercial "abuse".

          And finally, frankly... based upon all the facts... it is completely nonsensical and offensive to make any assertion that the, vast numbers of highly-experienced, people who have come to oppose such "Signing", and "trusted computing", lock-in... are actually, simply wanting to be able to "steal", and/or compromise, software.

          1. Anonymous Coward
            Anonymous Coward

            Re: Looks sweet ...without the pseudo-security... of lock-in.

            You just confirmed it when you started to whine about licensing and locks-in while failing to understand a lot of safe FOSS code distribution mechanism use some sort of signing too. Just, if you just authenticate the download and not the code on disk, which may be altered, you still have an hole. Sorry if it has the side effect of hindering cracks. Just like you can alter executables to allow for unlicensed use, you can alter them to become malware, and no CPU feature can protect from it once loaded.

            If you use FOSS code, even if signed, you don't have lock-in issue, licenses still apply, although many people like to ignore FOSS licenses also, especially the GPL.

            1. Anonymous Coward
              Anonymous Coward

              Re: Looks sweet ...without the pseudo-security... of lock-in.

              "Sorry if it has the side effect of hindering cracks. Just like you can alter executables to allow for unlicensed use, you can alter them to become malware, and no CPU feature can protect from it once loaded."

              Sad. Just sad. Ignoring the point to defend lockin is just really sad. And, you know what they say about doing the same thing again, and again, for the same result?

              1. Charles 9

                Re: Looks sweet ...without the pseudo-security... of lock-in.

                "Sad. Just sad. Ignoring the point to defend lockin is just really sad. And, you know what they say about doing the same thing again, and again, for the same result?"

                Yes. Doing the same thing over and over and actually getting a different result is PRAISED. It's called persistence.

                1. Had_to_be_said
                  Stop

                  Re: Looks sweet ...without the pseudo-security... of lock-in.

                  Uh... You do realize that doing the -EXACT- same thing, when it not only fails, but can be proven, logically, to be completely-erroneous (as well as extremely detrimental)... is -NOT- "persistence". It is pathologically-ignorant psychosis.

                  But then, I guess that's why... some seriously ignorant people supported "Bleeding the bad humors" out of desperately ill patients... until they died (from the "cure")... lasted so long in medical history.

                  Finally, by the way, the legal-definition for someone who's delusion is a clearly a demonstrable danger to themselves, and others, is... actually... "criminally-insanity". Just sayin'

  8. Anonymous Coward
    Anonymous Coward

    "If they don't match, then an exception is raised..."

    .."allowing the operating system to catch and stop execution."...

    ~ This is where I begin to wonder, if the part above can be borked by a 2-pronged attack...??? Hackers first find a loophole in the 'shadow watcher' and tackle that (sidestep CET)... Then they focus on classic buffer overflow or similar attacks etc...

    ~ Zooming out to macro level for a sec... The Bangladesh-Swift Sony hackers, intercepted the return confirm and manipulated that to make it seem like the transfers were legit (hence no exception thrown). Could the same happen here?

    ~ What I'd like to see is a hardware based LOCK system to prevent any manipulation of code whether its on hard-drive or in memory. The whole idea of self-modifying code is a disaster anyway. But Imagine code that was fixed like an original DVD. That's how it should be from disk to memory. Data on disk should be separated from code permanently and at a hardware level, so any weakness in OS can't be exploited.

    ~ The process of installing apps i.e. pressing a fixed DVD, would need to be a special process. You don't want it to be cumbersome for users but it also can't continue down the path of silent installs.

    ~ But how to actually achieve this in practice, and am I dreaming? ... Maybe, Its been a while since CS days. )... So honestly I don't pretend to have a great solution here, just throwing about ideas.

    ~ But if a program wanted to install itself or update code, I'd like to think that the user would be forced to do something physical like insert a master USB key / turn a physical key, something eternal, so that users better appreciate that what we have right now with UAC is oversight done by painting in water.

    1. Charles 9

      Re: "If they don't match, then an exception is raised..."

      "This is where I begin to wonder, if the part above can be borked by a 2-pronged attack...??? Hackers first find a loophole in the 'shadow watcher' and tackle that (sidestep CET)."

      To do that, they'd have to find a hardware exploit since they're talking about something directly in the CPU.

      "Zooming out to macro level for a sec... The Bangladesh-Swift Sony hackers, intercepted the return confirm and manipulated that to make it seem like the transfers were legit (hence no exception thrown). Could the same happen here?"

      Only again by a hardware exploit since that would involve intercepting a memory bus, something much harder to do than a network or device bus.

      "What I'd like to see is a hardware based LOCK system to prevent any manipulation of code whether its on hard-drive or in memory. The whole idea of self-modifying code is a disaster anyway. But Imagine code that was fixed like an original DVD. That's how it should be from disk to memory. Data on disk should be separated from code permanently and at a hardware level, so any weakness in OS can't be exploited."

      You're calling for a Harvard architecture. But much as you hate self-modifying code, it's essential in certain restricted environments or those where speed is essential. Without the idea that code is data and data is code, you couldn't have things like a JIT compiler, for example.

      "The process of installing apps i.e. pressing a fixed DVD, would need to be a special process. You don't want it to be cumbersome for users but it also can't continue down the path of silent installs."

      Problem here is that you run into an Unhappy Medium. Since Users are Stupid (and You Can't Fix Stupid), there's a need for silent installs of mission-critical stuff like security patches. Meaning you have an overlap where NO ONE is happy.

      "But if a program wanted to install itself or update code, I'd like to think that the user would be forced to do something physical like insert a master USB key / turn a physical key, something eternal, so that users better appreciate that what we have right now with UAC is oversight done by painting in water."

      And then people just lose their keys and complain.

  9. Anonymous Coward
    Anonymous Coward

    It'd be nice to have a system...

    * Where every PC was just a VM...

    * Or as per Linux: bootable from DVD and just run live etc.

    * Then once the system is set-up to your liking you deep freeze it.

    * How many problems arise just because of minor day-to-day config glitches...

    * I often go a year or two without installing or uninstalling anything of importance. So for this kind of use, why should everything be so fluid? It would probably mean killing off the registry, but is that a bad thing???

    * Every OS should have a Commit-Changes / Go-Back-To-Yesterday setting imho...

    1. Matthew 3

      Re: It'd be nice to have a system...

      Every PC is a VM? What would you run those VMs on? ;-)

      "It's turtles all the way down!"

      1. Peter Gathercole Silver badge

        Re: It'd be nice to have a system...

        What do you run the VM's on? Intel Mainframes of course.

      2. David 132 Silver badge

        Re: It'd be nice to have a system...

        "It's turtles all the way down!"

        And the turtles run on Logo.

      3. bombastic bob Silver badge

        Re: It'd be nice to have a system...

        "Every PC is a VM? What would you run those VMs on?"

        some kind of hypervisor, apparently not a bad concept. but a hypervisor has its inherent problems, too (recent vulnerabilities in "ring -2" as I recall). you're just kicking it down the road.

        if you want to put out a fire, you break the 'fire triangle' (fuel, heat, oxidizer). The 'shadow stack' does that, in a way. So does code address randomization, by the way...

    2. Anonymous Coward
      Anonymous Coward

      Re: It'd be nice to have a system...

      If you would like to do this on linux, a solution exists.

      In essence, compress your desired root filesystem into a SquashFS file, e.g. rootfs.squashfs.

      Using an Initial Ram Disk (initrd) mount an OverlayFS root, using a TmpFS mount for the writable portion. This leaves you with a system composed of a kernel, initrd, squashfs-root and some space for changes. When you want to "commit" changes, simply update rootfs.squashfs with your new changes.

      This will run from memory and is effectively a Live-CD

    3. Curious

      Re: It'd be nice to have a system...

      Nope, don't think that's sufficient, with so many long-running systems.

      The vulnerable applications should be virtualised, in their own bubble where they don't ooze all over the operating system core and registry, nor other applications, and interaction with the file system / network is through an application specific proxy that looks for unusual patterns of traffic.

      The major vendors all have their technology for this (App-V, Xenapp,Thinapp),

      Microsoft in particular has failed to popularise this capability, only now looking at building the client into windows 10. Currently it's only for deployment by large enterprises that have bought access via the vileness of microsoft's software assurance volume licensing to get MDOP.

      It should have been the cornerstone of their Windows Store, alongside or instead of the App-x throwaway stuff.

      1. Anonymous Coward
        Anonymous Coward

        Re: It'd be nice to have a system...

        "The vulnerable applications should be virtualised, in their own bubble where they don't ooze all over the operating system core and registry, nor other applications, and interaction with the file system / network is through an application specific proxy that looks for unusual patterns of traffic."

        Then they just attack the proxy or slip something through that the proxy doesn't detect. Why do you think sandboxes are so last season? Pretty sure a VM escape exploit will be coming soon.

  10. jms222

    Remember non x86 CPUs had non-executable stacks/areas Intel forgot to put into the 386 decades ago and coming late to the game they have made a big thing of it with marketing.

    1. Anonymous Coward
      Anonymous Coward

      I don't think it was that. Harvard architectures have their faults, mainly you can't run a JIT compiler on them (and remember that in the 90's, speed was more important due to computational limitations).

    2. Daniel B.

      Indeed. x86 has always been the lesser capable architecture out there.

    3. bombastic bob Silver badge

      non-executable flags on 386

      as I recall, protected mode had this, but you had to NOT alias the code area with a corresponding data area. Unfortunately, windows *DID* just that. 32-bit 'flat model' was no exception (there were a couple of 32-bit global selector entries available for that). I had a utility for peeking into the internals of win 3.x and '9x that would leverage that global selector. I'd create call gates and jump to internal operating system functions inside of drivers to get certain kinds of system information. It was kinda cool, but I ALSO recognized how vulnerable the systems were, because someone "not me" could do the SAME! THING! for nefarious purposes.

    4. Anonymous Coward
      Anonymous Coward

      Actually, since the 80286 each segment descriptor can have read/write/execute flags (see http://www.intel.com/Assets/en_US/PDF/manual/253668.pdf). You can have segments which are execute only (which means an application can nor read nor write them, but the CPU can still execute the code within..), segments which are readable and writable but not executable, segments which are read only...

      The issue is that OSes designers (Linux included, AFAIK) decided to adopt a too simple (and insecure) flat model where all segment registers for an application points to the same base address and limit is set to the largest value - then paging is used to map physical memory in and out the address space. So if you have a read/write segment descriptor which allows access to the same memory of an executable segment descriptor (and vice versa), you can do whatever you like - but the culprit is the OS design, not the CPU one.

      Anyway ROP doesn't put *code* on the stack, just manipulates returns to execute existing code. What you need is a stack which the application can't modify, and that what Intel talks about.

  11. Warm Braw

    Gradual retreat of the von Neumann machine...

    While modern CPUs may still have a single physical memory bus, it's interesting how we're moving towards a virtual Harvard architecture in which the program and its associated control structures are logically separate from the data.

    This latest change is not exactly a simple and elegant fix - it's not just shadow stacks, but a new "endbranch" instruction (which borrows a NOP from existing CPUs), so you'll need to recompile programs to get the full benefit.

    At some point, it might be nice to start with a clean slate. Perhaps Windows 10 is providing that opportunity, albeit as an unintended consequence...

    1. Brewster's Angle Grinder Silver badge

      Re: Gradual retreat of the von Neumann machine...

      Or back to a code segment, a data segment, and a stack segment. And because different index registers defaulted to different segments, you tended to stick with it. The thing was, we all hated it. Ideally you had all four segments pointing at one 64K window (a .com file). Otherwise, it was lots of jiggery pokery. The 386 was a breath of fresh air because you could use a flat memory model. But as I understand, it permissions were attached to the segment, hence the need for the NX bit on the page table.

  12. anthonyhegedus Silver badge

    this is all very well but...

    ...this only protects against exploits. This doesn't take into account human stupidity. If a computer user wants to open an executable in a zip file that comes in an attachment, they will. If that executable wants to encrypt all the documents on the machine, then it will. What we ALSO need, apart from clever CPUs, is clever OSes. Why can't a watchdog check what file operations are ongoing and warn the user ("more than 5 files got modified in the last 0.1 seconds - are you sure this is what was intended") or warn against running executables in zip files ("warning: what you are about to open is a program, not a document").

    And how about getting rid of the feature in Windows that hides the file extension, so "file.doc.exe" doesn't show as "file.doc"?

    I'm pretty sure that better Antivirus hooks and cleverer email programs (so obviously not Outlook then) are the key to reducing malware attacks.

    1. Charles 9

      Re: this is all very well but...

      "more than 5 files got modified in the last 0.1 seconds"

      This can happen when you copy a bunch of small files. Too much risk of false negatives resulting in click fatigue (think UAC). Also, smarter malware can just "smurf" and encrypt things slowly to stay under the radar.

      "warn against running executables in zip files"

      They already do that as far as it goes. It warns against running files just downloaded (shows the signature if it has one), warns against running things off a network, and so on.

      "And how about getting rid of the feature in Windows that hides the file extension, so "file.doc.exe" doesn't show as "file.doc"?"

      That's mainly to prevent unintentional extension altering, which casual users may not have the skill to undo. Anyway, e-mail programs and archive managers (the main conduits for this trick) show the extensions.

      "I'm pretty sure that better Antivirus hooks and cleverer email programs (so obviously not Outlook then) are the key to reducing malware attacks."

      Smarter malwares target and disable these or just go above them straight to the kernel where they can't be dislodged. Some even go into the BIOS, MBT, or EFI, making them nuke-proof.

      But in the end, as you say, until a better human comes along, this is the best we can do.

    2. Anonymous Coward
      Anonymous Coward

      Re: this is all very well but...

      "And how about getting rid of the feature in Windows that hides the file extension, so "file.doc.exe" doesn't show as "file.doc"?"

      It's not a feature, it's a profoundly stupid default setting that I compulsively "fix" on any machine I use. The thing you're looking for is Control Panel -> Folder Options -> "View" tab -> "Hide extensions for known file types" or take Tools -> Folder Options from the menu in any Explorer window. I think Win8+ have an appropriate checkbox somewhere within the ribbon.

      1. Charles 9

        Re: this is all very well but...

        "It's not a feature, it's a profoundly stupid default setting"

        One problem. You're also talking about stupid users. Unless a license becomes compulsory for something that operates in the privacy of one's home ("Papers, please!"), you've got a pretty nasty problem.

  13. Anonymous Coward
    Anonymous Coward

    FWIW, Check Point is already using Intel CPU technology to detect and block ROP exploits in their sandbox offering(SandBlast).

  14. Bob Dunlop

    Separate return and data stacks anyone

    I'm sure I remember working with processors that had separate code return and data stacks back in the 80s. All you would have to do is protect the return stack from all but call and return instructions.

    Actually now I think about it the "shadow stack" is just like the old code return stack. They've just left a copy of the return address on the data stack as well so as not to upset compilers/debuggers that expect to see it there.

    1. Anonymous Coward
      Anonymous Coward

      Re: Separate return and data stacks anyone

      I assume the reason they didn't do this is because of compatibility concerns. Too bad AMD didn't do a separate code return stack in AMD64, then 32 bit code could be compatible and less secure, but all 64 x86 code from day one would have been secure without all the additional overhead of this scheme.

  15. Mike 16

    One layer down, infinity to go.

    First off, the folks calling for "Harvard architecture" have apparently missed the part where this Return Oriented Programming does not alter program memory. It alters the sequence in which bits and pieces of the program memory are executed. It's the answer to things like the NX bit and previous attempts to control access to program memory, which were undermined by the "Wah! We don't like having to figure out what bits of our applications go together. We want one big slab o' bytes and we want to fiddle _all_ of them" attitude of an army of programmers. Water under the bridge.

    But more germane to my subject line, this addition (like the NX bit) only addresses the issue of ROP-enabled malware at the machine-language level. What percentage of software (by source file count or such, not revenue or cycles) is today written to run on the JVM (which has, IIRC, at least two implementations) or is derived JITed from some scripting language. That's a serious question, not snark. My impression is "quite a lot".

    In what sense are such virtual machines immune from such attacks? Yes, I know about the bytecode validators, and the various attacks that bypass them. Yes, I know that nobody sane globally enables Actionscript or Java Applets in the browser, or allows them at all unless their bank or employer demands it. But browsing the web with javascript totally disabled, while restful, is limiting. And someday you or someone at your firm or household will just _have_ to see that charming Flying Pig video, and BANG!

    1. Charles 9

      Re: One layer down, infinity to go.

      "In what sense are such virtual machines immune from such attacks?"

      Well, at some point, the code MUST go through the CPU, meaning it should be able to screen even these. If malicious bytecode or interpreted code causes the compiler or interpreter (both of which are native) to act funny, this should catch it. Anything else and you're looking at high-level malware which will likely have a few other catches involved, but even then if high-level malware is trying to exploit the lower-level stuff, this can still act as a safeguard.

    2. Anonymous Coward
      Anonymous Coward

      If you can't fix everything, fix nothing?

      Sure, this addresses only one type of attack. However, it closes off that attack completely - in hardware - so you don't have to worry about it anymore. It may be less common now but it is one more trick in the toolbag for breaking into systems that gets taken away. I'd call that a win, even if there is still a lot of other stuff in the toolbag that remains.

  16. Cynic_999

    Would also bork legitimate code

    I've frequently written code where a 2nd level subroutine deliberately pop's off its return address when a terminating condition is met so as to return to the original caller rather than the 1st level subroutine. It avoids the need to do a condition test after every call from the 1st to the 2nd level subroutine to see if it should return. This safeguard would appear to prevent that type of method (which is reasonably common in assembler programs, perhaps not so much in high level languages).

    1. patrickstar

      Re: Would also bork legitimate code

      This comes with a very big performance penalty on modern x86 CPUs, so noone should be doing it in new code anyways. The CPU actually keeps a stack of return addresses around so it can predict where the RET goes. Anything except CALL/RET that fiddles with return addresses will cause it to become unbalanced with a pretty hefty performance penalty.

      1. Ken Hagan Gold badge

        Re: Would also bork legitimate code

        By "modern", I assume you are referring to anything with out-of-order execution and branch-prediction, which means almost every x86-class CPU designed since the mid-90s. (If I remember correctly, Intel made a few in-order Atoms about 10 years ago.)

        This sounds like it would have been a nice optimisation for hand-tuned inner loops in the 1980s and possibly standard-operating/optimisation-procedure in the 1970s or before. It's 2016 now and you could probably run that 1970s code in a VMM that was written in JavaScript and still be faster than the CPU you originally optimised for.

    2. Number6

      Re: Would also bork legitimate code

      Back in the days of the 8-bit processor, I remember writing code (in assembler!) that would implement a 16-bit jump by pushing the target onto the stack and doing a ret. Not all processors needed it, the 8080/Z80 has a JP (HL) instruction so that one didn't need to involve the stack, although it also had the EX (SP),HL instruction to make it easy to manipulate the stack content.

      1. bombastic bob Silver badge

        Re: Would also bork legitimate code

        "Back in the days of the 8-bit processor, I remember writing code (in assembler!) that would implement a 16-bit jump by pushing the target onto the stack and doing a ret."

        on SOME processors, you STILL have to do that. I'm thinking 'microcontrollers' at the moment. There's no 24-bit jump instruction on an AVR, but some AVRs have 24-bit addressing. So the fastest way to jump 24-bit is to leverage the 24-bit program counter value on the stack after a call. I am pretty sure there's a 24-bit CALL function, however [can't recall at the moment]. Just no JUMP instruction. So when my bootloader does a jump to the start of code, while running within the highest 128kb page of memory, it must do the 3-byte address stack push followed by 'RET'. With '#ifdef' around it for CPUs that have a < 128kb address space. It works.

  17. anonymous boring coward Silver badge

    "block malware infections on computers at the processor level"

    Aren't all infections happening at the processor level?

  18. Rol

    Commodore

    Back in my youth, learning the basics on a 4032 PET at college, we used to write games for each other to play. Unfortunately the games deteriorated into a protection war, as we each tried to stop each other from breaking into the game and gleaning the necessary to win.

    The REM statement came in very handy as several control characters deftly inserted after, would stop all but the most determined from simply listing the crucial bits of code, even printing the list was affected.

    My speciality was to peek into various parts of memory and then use that to poke back into the main code, hence presenting total gobbledegook to snoopers, as the program written itself as it ran.

    What did occur to me, was that as the program was running, it obviously grew, which meant I could check the size along the way for possible hacks, this resulted in a run check string that verified the program was original at multiple points along the way.

    And what was worth all that effort, you know, I can't remember.

  19. Daniel B.

    Intel

    Still playing catchup with the superior architectures from the 90s?

  20. YechiamTK

    Trying to kill fire with fire..

    Although the solution seems legit, I just can't help but thinking that this solution will be mere temporary, until the next guy finds how to exploit this so called "shadow stack" (which wouldn't take long I guess). Intel (or any other brilliant company) should come up with a more permanent solution than that, and I'm sure they can- it'll just take some effort from the developers side to adapt to changes I think should be made to the CPU. I think assembly should be rethinked (is that a word? lol), and be made more secure FROM THE START. It is hard to do and will take years to achieve and to be implemented in the next generation of computers, but I think this is the best solution the computers industry need - to be rethinked.

    1. Anonymous Coward
      Anonymous Coward

      Re: Trying to kill fire with fire..

      How do you exploit a feature that will be internal to the CPU? That's one reason they're going this route.

  21. Anonymous Coward
    Anonymous Coward

    A bad idea

    Like any substantial change to the existing semantics, this is going to break a lot of things. Exception handling is one prominent example: any code relying on the old-school setjmp/longjmp will be buggered. Fixing it would require compiler modifications and complete recompile of each application involved, including the libraries. Even then, there might be corner cases which won't be fixable. One might think of fixing this in an exception handler, but my head starts to hurt even trying to think of how to deal with the recursive routines.

    At this point in time, x86 has become a horrible mess of warts, some of which have warts of their own. Adding yet another level of complexity will further decrease the number of people who actually have an idea of how the whole thing is propped up - and it is already tending to zero as it is.

    1. Charles 9

      Re: A bad idea

      And yet so much software is still written for it. Something must be going for it even if it's pretty damn complex.

      PS. Why would this specifically break setjmp/longjmp? Direct jumping IIRC is not affected by this: only CALL/RET. Do those two functions rely on the stack in an unusual way?

      1. Anonymous Coward
        Anonymous Coward

        Re: A bad idea

        PS. Why would this specifically break setjmp/longjmp? Direct jumping IIRC is not affected by this: only CALL/RET. Do those two functions rely on the stack in an unusual way?

        Because longjmp() can return to an upstream caller function without going through the intermediate stack frames and executing the corresponding RET instructions. Now when the the routine containing the setjmp call returns, the shadow and architectural stacks will be inconsistent, triggering a protection fault. The handler could try to fix it up, but then it would need to figure out whether a) this is a legitimate setjmp/longjmp pair; and b) whether this was a recursive call - in which case it will also have to decide which of the instances of the setjmp() frame it was supposed to refer to.

        Alternatively, the "shadow" stack hardware could also tag its return (E)IPs with the the (E)SP from the time CALL happened. This will fix up the setjmp/longjmp problem for a single-threaded code, where it is used for exception handling.

        On the other hand, it is easy to come up with other examples where such shadow stack will break perfectly well-behaved applications (e.g. some ways to implement cooperative user-level multi-threading). I am sure these cases can be fixed up as well, but the complexity and overhead of the transparent implementation will inevitably snowball.

        1. Anonymous Coward
          Anonymous Coward

          Re: A bad idea

          "Because longjmp() can return to an upstream caller function without going through the intermediate stack frames and executing the corresponding RET instructions."

          Sounds a lot like bad programming practice which is why I was taught to avoid unstructured jumps like GOTOs unless the language doesn't give you an option. IOW, I would call this a case of "something's broken" and I wouldn't trust that kind of coding in any event unless there was a VERY good reason for it.

  22. Herby

    Maybe it is time...

    To design a different CPU that from scratch has proper checks and the like. With known exploits cataloged it ought to be easy to try various threats against the CPU to see if they still work.

    Oh, make it a big-endian CPU as well!

    1. Charles 9

      Re: Maybe it is time...

      Trouble is every time they try they usually hit a wall: performance demands which necessarily take a hit with things like tagged memory. Who cares about security if the job doesn't get done in time? You can't just get it done right OR get it done fast. You MUST get it right AND fast at the same time or things break...

  23. bombastic bob Silver badge

    address space randomization might help more

    perhaps code address space randomization would help more. In the 64-bit world, this is practical. Just have every instance of a program load with a different start address for 'bottom of code space'. It won't be perfect, but it could be done in SOFTWARE with existing tech. Similar things have been done for network port assignments to help prevent certain kinds of "port predicting" attack vectors.

    that way the code address won't be easily known. You'd need CODE to discover what a function address actually is so that you CAN jump to it, and if you can't run code via your exploit, you can't (easily) get the address to jump to. the RET house of cards falls down.

    1. Anonymous Coward
      Anonymous Coward

      Re: address space randomization might help more

      Doesn't ASLR address this already but can be negotiated with things like heap spraying and JIT spraying and using basic functions (IOW, known targets) where the addresses need to be known to call them in the first place?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like