Old is new again?
Theo from OpenBSD had a rant about Intel and similar problem in 2007. https://www.theregister.co.uk/2007/06/28/core_2_duo_errata/
And people say I'm crazy for using SPARC.
In the wake of The Register's report on Tuesday about the vulnerabilities affecting Intel chips, Chipzilla on Wednesday issued a press release to address the problems disclosed by Google's security researchers that afternoon. To help put Intel's claims into context, we've annotated the text. Bold is Intel's spin. Intel and …
From BBC FONUIZ: Rush to fix 'serious' computer chip flaws
Typically, both parties agree not to publicise the problem until a fix has been implemented, so that criminals cannot take advantage of the issue.
This time it looks like somebody jumped the gun and information was leaked before a software fix was ready for distribution.
I haven't heard about anyone "this time" mysteriously jumping any gun, what do they mean by that?
"This time it looks like somebody jumped the gun and information was leaked before a software fix was ready for distribution."
Google have got form for disclosing security flaws before fixes are ready.
I can't help wondering if they have some new AMD-powered Chromebook waiting in the wings to hit the market in a couple of weeks time...
"Google have got form for disclosing security flaws before fixes are ready."
Giving the other party 90 days to fix, followed by them going into radio silence up to and beyond the 90 days isn't quite the same what happened here.
Seems more likely that these bugs affected so many systems that many more folk needed to be told. When so many people have a need to know, a leak becomes almost inevitable.
> I haven't heard about anyone "this time" mysteriously jumping any gun, what do they mean by that?
As the article reads like a cut 'n' paste job, it's not worth while trying to analyse it. The layers of obfuscation are geological in complexity, suggestions it's just a software issue, poor Intel with some fearful leaker being nasty to them, suggestions of it all being sorted if only they'd had time etc etc. Even El Reg has had to conflate two separate issues, and each is complex. Expect colleagues, family and friends to start conversations with you today with "I don't understand what they mean..."
Positive, he's the only person running it :-)
Nah. We have a couple of old[1] SPARC-based E450's in the computer room doing old[2] Oracley stuff. Fortunately, one of them just got supplanted by a system upgrade so I'll be able to switch it off soon.
[1] Can anyone else remember when the E450's were new and oooh-shiny? I can..
[2] As in 'written last century' old. Or as we like to call it "johnny-come-lately new fangled stuff"
Actually, because of the design of the SPARC processor, it is immune to Meltdown. To be technical, unlike x86, SPARC processors have a separate TLB (Translation Lookaside Buffer) for kernel pages only. That's the source of the slowdown on x86. With the kernel fully removed from the process address space, a full context switch is needed because now you are fully switching address spaces, and the TLB contents are dumped. For a TLB miss, it takes 2-5 memory accesses to read in a page table entry, and at roughly 20ns access time compared to sub 1.0ns access time for a cache hit, you are looking at a performance hit that is two orders of magnitude slower than a cache hit. In case you are wondering, the TLB is the cache that is used by the memory management unit for translating virtual addresses into physical addresses.
The Sparc may or may not be vulnerable, but I don't think this explanation covers it.
The Intel issue comes from a speculative code path loading data into the L1 cache from mapped but protected memory. This has nothing to do with TLBs -- the real question is whether the kernel address space is accessible to speculatively executed instructions.
Intel's error came from allowing a speculative load to proceed even when the ring # was wrong to allow the access to complete.
SPARC systems based on SPARC V9 (since 1993) have Kernel page table isolation and so have separated kernel and process address spaces so should not be affected by Meltdown. Heres the documentation : https://books.google.se/books?id=3Ys27a_I1tEC&pg=PT552&lpg=PT552&dq=address+spaces+on+SPARC+systems&source=bl&ots=5mwktQtlHt&sig=mIu8LXvWq1LbN3kqS6PWP-5ldvU&hl=sv&sa=X&ved=0ahUKEwin_9OD2r7YAhXQJ1AKHUcIA38Q6AEIRjAD#v=onepage&q=address%20spaces%20on%20SPARC%20systems&f=false
If you patch your hypervisor and guest OS, do you take a double hit on performance?
Also, according to Red Hat's security note, Spectre affects all CPUs that use speculative execution, including AMD, ARM, and POWER architectures. Are they incorrect?
Finally, if you're going to label one vulnerability Spectre, why not call the other one SMERSH?
Inquiring minds want to know!
"If you patch your hypervisor and guest OS, do you take a double hit on performance?"
As the bug is all about preemtive execution which happens at the microcode level you should only have to patch the hypervisor. Linux at least is allowing you to turn off the patch, presumably so guest OS does not have to take the hit as well.
As for Windows that is anyones guess atm!
Good question. MS are seeming to indicate that if the Hypervisor is patched, the guest is protected. So will the OS detect that it's running on a patched OS and not "double implement" the memory protection?
https://azure.microsoft.com/en-us/blog/securing-azure-customers-from-cpu-vulnerability/
The hypervisor patch protects the hypervisor provider, not the guest OS user.
The guest isn't protected until it is patched.
Azure/AWS/Goog etc are all protecting their infrastructure from pivots between customers. If customers don't patch their own systems they are unprotected. Remember cloud providers protect the cloud, customers protect themselves.
There shouldnt be a double performance penalty though.
Not 100% but I dont think that is correct analysis, if its a hardware bug patched in hypervisor software, you dont need to patch virtual servers built on top. unless the virtualization software has the same bug as the hardware.
Ie if the hypervisor flushed the data that was speculativly executed, its not there for the vm to abuse either.
If only the hypervisor needs to be patched, why are Microsoft rebooting affected Virtual Machines on Azure? Surely they can migrate them between patched hosts?
MS being lazy, or something else?
PS, I'm a software guy, not hardware or OS, so be kind...
Not being at Microsoft I couldn't say for sure, but my guess would be the scale of the problem. They might just not have enough servers to migrate people en-mass that way and have found it's quicker and less risky to have a small amount of down time to just patch everything immediately
[Intel CEO is invited to explain his share activities to long time investor E. S. Blofeld]
"We don't take kindly to failure, Mr Krzanich. Security, please show Mr Krzanich my collection of tropical fish. The carnivorous ones will be particularly interesting..."
"In Azure, there is no Live Migration facility. "
Azure has live migration now - you can pay extra for it.
I'd guess that capacity and the scale of the issue is the bigger thing here; they need to patch every box in the datacenter and need to do it fast. Easier to just bounce all the hosts at once than schedule a bunch of VMs cascading across the network.
I was trying to work this out... on the whole I concluded I didn't understand enough to know but if the compiler generates code that does speculative loads aren't you still screwed (although of course you can't blame the processor, just the compiler). See https://blogs.msdn.microsoft.com/oldnewthing/20150804-00/?p=91181
if the compiler generates code that does speculative loads aren't you still screwed
I'd think not, or much less so. Speculative accesses inside the processor that never get committed could perhaps access memory that the user shouldn't have permission to touch, but compiler-generated speculative loads are going to have to use standard instructions, albeit it out-of-order. They will be constrained by the protections available to whatever context the complete instructions are executed in, i.e. a user process won't be able to access kernel memory.
They're also easier to fix, either by a compiler patch or perhaps some post-processing of binaries.
With the Itanium processor virtual addresses are global, the bits of the kernel have 1 set of virtual addresses while all user processes have other virtual addresses. The cache tags and the TLB basically know who a bit of addresses space belongs to. Sadly it's been too many years since I played that far inside the Itanium to remember whether there is a region register which is programmable from user privilege level (like there is 1 user programmable space register on PA-Risc) but even if a long virtual pointer were used from assembler the access would be blocked by the TLB's protection mechanism.
The extensive Google Blog post suggests that AMD processors are only "less" vulnerable - and I don't imagine the investigations have yet stopped. Specifically, they found they could use side-channel attacks to get memory contents from the same process on an AMD chip (hardly a big deal, but still a warning flag) and they could read the entire contents of kernel memory on an AMD chip IFF the Berkeley Packet Filter (BPF) Just-In-Time compiler is enabled in the kernel. Is that an AMD bug or a Linux bug? Can you actually assign "blame"?
Since the entire vulnerability is related to the relative execution time of cached and non-cached operations, it's difficult to believe that there are not other potential exploits to be discovered. The BPF issue is interesting because it means that the ability to inject any abitrary code into the kernel, even code that is statically proven to be "safe" in traditional software terms, is in fact a potential vector for side-channel attacks for which there is no obvious mitigation.
That's a very big deal for a lot of Linux-based firewalls and probably many other applications.
they could read the entire contents of kernel memory on an AMD chip IFF the Berkeley Packet Filter (BPF) Just-In-Time compiler is enabled in the kernel.
The name "Berkeley Packet Filter" should be a give-away - this is part of the firewall in FreeBSD derived systems, Linux uses a different firewall, as does OpenBSD. This may affect a large number of routers which use BSD derived code - a very high risk since, in most cases, (a) this is not obvious to the owner/user, and (b) they are very unlikely to be patched.
Routers are a great target for malware - because they are Internet connected and always on.
The good news is that this should be easily patched IF the manufacturer is threatened with sufficiently serious consequences - which may or may not include "cruel and inhuman torture" - IANAL.
Linux uses a different firewall
The Linux kernel supports eBPF since version 3.18 and the exploit was demonstrated using a Debian distribution, though by default eBPF would not be a configured kernel option so it might not be so widely used.
It isn't necessarily an easy patch (for any architecture) and it applies to any JIT code (not just BPF), so I assume there may be some impact on nft for Linux. ARM recommend changing the code emitted by the JIT compiler using new conditional speculation barriers. I'm not sure what options are being proposed for other platforms, but they could well have performance implications of their own aside from those already being discussed.
Bad form, I know, to reply to my own post, but there was one other thing that occurred to me.
Computer architecture has historically assumed that you controlled your computer and the workload that ran on it. The protections in place were there largely to mitigate against mistakes - bugs in your software taking down other software or the computer itself. For the most part, your computer ran software whose behaviour was predictable. The improvements in memory capacity and CPU speed largely depend on that predictability.
The reason that Spectre, Meltdown and, previously, Rowhammer are issues is mostly because the assumption of ownership is no longer valid. Either you have consciously chosen to run your software on someone else's computer (cloud) or it's possible that ownership of what you believe is your own computer has been ceded to criminals (possibly with the backing of state resources).
If you control your own computer, it doesn't really matter if you can read the kernel memory from user space. If you don't, pretty much every statistically-based optimisation (whether it's DRAM stability or branch prediction) is up for exploitation by software that's designed to skew the statistics and either gain knowledge that it shoudn't have or deny service (for example by forcing cache flushes).
Computer architecture hasn't changed a great deal in principle from the days of co-operative time sharing - perhaps it's time it needs to be reinvented for an explicitly hostile environment.
"If you control your own computer, it doesn't really matter if you can read the kernel memory from user space."
Until you go to a website that has some dodgy javascript running in user space (natch), which can then start reading stuff in kernel memory (or presumably anywhere else).
Unless you're writing or auditing every byte of code that runs on your computer, you have to start hoping that you truly control your computer.
Browser vendors are already rushing to prevent these attacks being exploitable from JavaScript. The attacks require precise timing measurements, so they are reducing the precision of timers available to JavaScript. This will make them very hard to exploit from JavaScript.